Limit of P Norm of a Continuous Function is the Max

In mathematics, a smooth maximum of an indexed family x 1, ...,x n of numbers is a smooth approximation to the maximum function max ( x 1 , , x n ) , {\displaystyle \max(x_{1},\ldots ,x_{n}),} meaning a parametric family of functions m α ( x 1 , , x n ) {\displaystyle m_{\alpha }(x_{1},\ldots ,x_{n})} such that for every α , the function m α {\displaystyle m_{\alpha }} is smooth, and the family converges to the maximum function m α max {\displaystyle m_{\alpha }\to \max } as α {\displaystyle \alpha \to \infty } . The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, m α max {\displaystyle m_{\alpha }\to \max } as α {\displaystyle \alpha \to \infty } and m α min {\displaystyle m_{\alpha }\to \min } as α {\displaystyle \alpha \to -\infty } . The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.

Examples [edit]

Smoothmax of (−x, x) versus x for various parameter values. Very smooth for α {\displaystyle \alpha } =0.5, and more sharp for α {\displaystyle \alpha } =8.

For large positive values of the parameter α > 0 {\displaystyle \alpha >0} , the following formulation is a smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum.

S α ( x 1 , , x n ) = i = 1 n x i e α x i i = 1 n e α x i {\displaystyle {\mathcal {S}}_{\alpha }(x_{1},\ldots ,x_{n})={\frac {\sum _{i=1}^{n}x_{i}e^{\alpha x_{i}}}{\sum _{i=1}^{n}e^{\alpha x_{i}}}}}

S α {\displaystyle {\mathcal {S}}_{\alpha }} has the following properties:

  1. S α max {\displaystyle {\mathcal {S}}_{\alpha }\to \max } as α {\displaystyle \alpha \to \infty }
  2. S 0 {\displaystyle {\mathcal {S}}_{0}} is the arithmetic mean of its inputs
  3. S α min {\displaystyle {\mathcal {S}}_{\alpha }\to \min } as α {\displaystyle \alpha \to -\infty }

The gradient of S α {\displaystyle {\mathcal {S}}_{\alpha }} is closely related to softmax and is given by

x i S α ( x 1 , , x n ) = e α x i j = 1 n e α x j [ 1 + α ( x i S α ( x 1 , , x n ) ) ] . {\displaystyle \nabla _{x_{i}}{\mathcal {S}}_{\alpha }(x_{1},\ldots ,x_{n})={\frac {e^{\alpha x_{i}}}{\sum _{j=1}^{n}e^{\alpha x_{j}}}}[1+\alpha (x_{i}-{\mathcal {S}}_{\alpha }(x_{1},\ldots ,x_{n}))].}

This makes the softmax function useful for optimization techniques that use gradient descent.

LogSumExp [edit]

Another smooth maximum is LogSumExp:

L S E α ( x 1 , , x n ) = ( 1 / α ) log ( exp ( α x 1 ) + + exp ( α x n ) ) {\displaystyle \mathrm {LSE} _{\alpha }(x_{1},\ldots ,x_{n})=(1/\alpha )\log(\exp(\alpha x_{1})+\ldots +\exp(\alpha x_{n}))}

This can also be normalized if the x i {\displaystyle x_{i}} are all non-negative, yielding a function with domain [ 0 , ) n {\displaystyle [0,\infty )^{n}} and range [ 0 , ) {\displaystyle [0,\infty )} :

g ( x 1 , , x n ) = log ( exp ( x 1 ) + + exp ( x n ) ( n 1 ) ) {\displaystyle g(x_{1},\ldots ,x_{n})=\log(\exp(x_{1})+\ldots +\exp(x_{n})-(n-1))}

The ( n 1 ) {\displaystyle (n-1)} term corrects for the fact that exp ( 0 ) = 1 {\displaystyle \exp(0)=1} by canceling out all but one zero exponential, and log 1 = 0 {\displaystyle \log 1=0} if all x i {\displaystyle x_{i}} are zero.

p-Norm [edit]

Another smooth maximum is the p-norm:

| | ( x 1 , , x n ) | | p = ( | x 1 | p + + | x n | p ) 1 / p {\displaystyle ||(x_{1},\ldots ,x_{n})||_{p}=\left(|x_{1}|^{p}+\cdots +|x_{n}|^{p}\right)^{1/p}}

which converges to | | ( x 1 , , x n ) | | = max 1 i n | x i | {\displaystyle ||(x_{1},\ldots ,x_{n})||_{\infty }=\max _{1\leq i\leq n}|x_{i}|} as p {\displaystyle p\to \infty } .

An advantage of the p-norm is that it is a norm. As such it is "scale invariant" (homogeneous): | | ( λ x 1 , , λ x n ) | | p = | λ | × | | ( x 1 , , x n ) | | p {\displaystyle ||(\lambda x_{1},\ldots ,\lambda x_{n})||_{p}=|\lambda |\times ||(x_{1},\ldots ,x_{n})||_{p}} , and it satisfies the triangular inequality.

Other choices of smoothing function [edit]

m a x α ( x 1 , x 2 ) = ( ( x 1 + x 2 ) + ( x 1 x 2 ) 2 + α ) / 2 {\displaystyle {\mathcal {max}}_{\alpha }(x_{1},x_{2})=\left((x_{1}+x_{2})+{\sqrt {(x_{1}-x_{2})^{2}+\alpha }}\right)/2} [1]

Where α 0 {\displaystyle \alpha \to 0} is a parameter.

See also [edit]

  • LogSumExp
  • Softmax function
  • Generalized mean

References [edit]

  1. ^ Biswas, Koushik; Kumar, Sandeep; Banerjee, Shilpak; Ashish Kumar Pandey (2021). "SMU: Smooth activation function for deep networks using smoothing maximum technique". arXiv:2111.04682.

https://www.johndcook.com/soft_maximum.pdf

M. Lange, D. Zühlke, O. Holz, and T. Villmann, "Applications of lp-norms and their smooth approximations for gradient based learning vector quantization," in Proc. ESANN, Apr. 2014, pp. 271-276. (https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2014-153.pdf)

acevedocamond.blogspot.com

Source: https://en.wikipedia.org/wiki/Smooth_maximum

0 Response to "Limit of P Norm of a Continuous Function is the Max"

Enviar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel