Divergences

Divergences.el

Divergences is a Julia package that makes it easy to evaluate the value of divergences and their derivatives.

Definition

A divergence between \(a\in \mathbb{R}^n\) and \(b\in\mathbb{R}^n\) is defined as \[ D(a,b) = \sum_{i=1}^n \gamma(a_i/b_i) b_i, \] where \(\gamma:(a_{\gamma},+\infty)\to\mathbb{R}_{+}\), \(a_{\gamma}\in\mathbb{R}\) is strictly convex and twice continuously differentiable on the interior of its domain. The divergence function \(\gamma\) is normalized as to satisfy \(\gamma(1) = 0\), \(\gamma'(1)=0\), and \(\gamma''(1)=1\).

The gradient and the hessian of the divergence with respect to \(a\) are given by \[ \nabla_{a}D(a,b)\equiv\left.\frac{\partial D(u,v)}{\partial u}\right|_{u=a,v=b}=\begin{pmatrix}\gamma'(a_{1}/b_{1})\\ \gamma'(a_{2}/b_{2})\\ \vdots\\ \gamma'(a_{n}/b_{n}) \end{pmatrix}\] and \[ \nabla_{a}^{2}D(a,b)\equiv\left.\frac{\partial^{2}D(u,v)}{\partial u\partial u}\right|_{u=a,v=b}=\begin{pmatrix}\frac{\gamma''(a_{1}/b_{1})}{b_{1}} & 0 & \cdots & 0\\ 0 & \frac{\gamma''(a_{2}/b_{2})}{b_{2}} & 0 & \vdots\\ \vdots & 0 & \ddots & 0\\ 0 & \cdots & 0 & \frac{\gamma''(a_{n}/b_{n})}{b_{n}} \end{pmatrix} \] respectively. Given the normalization \(\gamma'(1)=0\), and \(\gamma''(1)=1\), we have that \[ \nabla_{a}D(a,a) = 0, \quad \nabla^2_{a}D(a,a) = 1. \]

The divergences implemented in the packages are (i) the Kullback-Leibler (\(KL\)), (ii) the reverse Kullback-Leibler (\(RKL\)); (iii) the Hellinger divergence (\(\mathscr{H}\)), (iv) the chi-squared \((\chi^2)\), and (v) the Cressie Read (\(CR\)) family of divergence. The table below report the form of these divergences together with their first and second order derivatives.

\(\gamma(u)\) Domain \(\nabla_\gamma(u)\) \(H_\gamma(u)\)
\(KL\) \(u \log(u) - u + 1\) \((0,+\infty)\) \(\log(u)\) \(\frac{1}{u}\)
\(RKL\) \(\log(u) + u - 1\) \((0,+\infty)\) \(1-\frac{1}{u}\) \(\frac{1}{u^2}\)
\(\mathscr{H}\) \(2u + (2 - 4\sqrt{u})\) \((0,+\infty)\) \(2\left(1 - \frac{1}{\sqrt{u}}\right)\) \(\frac{1}{u^{3/2}}\)
\(\chi^2\) \(\frac{1}{2}(u - 1)^2\) \((-\infty,+\infty)\) \(u - 1\) \(1\)
\(CR\) \(\frac{u^{1+\alpha} + \alpha - u(1+\alpha)}{\alpha(1+\alpha)}\) \((0,+\infty)\) \(\frac{u^\alpha - 1}{\alpha}\) \(u^{\alpha-1}\)

The Cressie Read is a family of divergences whose members are indexed by \(\alpha\in\mathbb{R}\). This family contains the chi-squared divergence (\(\alpha = 1\)), the Kullback Leibler divergence (\(a \to 0\)), the reverse Kullback Leibler divergence (\(a \to -1\)), and the Hellinger distance (\(a = -1/2\)).

Since if \(\alpha<0\), \(\gamma\) in the Cressie Read family is not convex on \((-\infty, 0)\) and thus we set \(\gamma(u)=+\infty\).

Convex Conjugate

The convex conjugate conjugate of \(\gamma\) is defined as \[ \gamma^*(u) = \sup_{u\in\mathbb{R}} \left\{u\upsilon - \gamma(u)\right\}. \] For continuously twice differentiable function, the convex conjugate is \[ \gamma^*(z) = z\,(\gamma')^{-1}(z) - \gamma\left((\gamma')^{-1}(z)\right). \] where \((\gamma')^{-1}(z) := \{u: \gamma'(x) = z\}\). The domain of \(\gamma^*\) is \((-\infty, d)\), where $$ d = _{u+} (u)/u.

The first derivative of the convex conjugate \(\gamma^*(z)\) can be found as: \[ \frac{d}{dz} \gamma^*(z) = (\gamma')^{-1}(z). \]

The second derivative can be derived using the inverse function theorem: \[ \frac{d^2}{dz^2} \gamma^*(z) = \frac{1}{\gamma''((\gamma')^{-1}(z))}. \]

Modified divergences

\(\gamma^*(z)\) \(\lim_{u \to \infty} \frac{\gamma(u)}{u}\) \(\lim_{u \to \infty} \frac{u \gamma'(u)}{\gamma(u)}\)
\(KL\) \((e^z - 1)\) \(0\) \(1\)
\(RKL\) \(\log(1 - z) + 1\) \(1\) \(1\)
\(\mathscr{H}\) $ (1 - 2)$ \(2\) \(0\)
\(\chi^2\) \(\left(z + \frac{z^2}{2}\right)\) \(\infty\) \(2\)
\(CR\) \((z-1) (1+\alpha z)^{1/a}+\log \left((\alpha z+1)^{1/a}\right)+1\) \(0\) \(\begin{cases} 1+\alpha & \alpha>0 \\ 1 & \mathrm{otherwise}\end{cases}\)

For many of the divergences defined above the effective domain of their conjugate, \(\gamma^*\), does not span \(\mathbb{R}\) since \(\gamma(u)/u \to l < +\infty\) as \(u \to +\infty\).

For some \(\vartheta>0\), let \(u_{\vartheta}\equiv 1+\vartheta\). The modified divergence \(\gamma_{\vartheta}\) is defined as \[ \gamma_{\vartheta}(u) = \begin{cases} \gamma(u_{\vartheta}) + \gamma'(u_{\vartheta})(u-u_{\vartheta}) + \frac{1}{2}\gamma''(u_{\vartheta})(u-u_{\vartheta})^2, & u\geqslant u_{\vartheta}\\ \newline\gamma(u), & u\in (0,u_{\vartheta})\\ \newline \lim_{u\to 0^{+}} \gamma(u), & u=0 \\ \newline+\infty, & u<0 \end{cases}. \]

It is immediate to verify that this divergence still satisfies all the requirements and normalization of \(\gamma\). Furthermore, it holds that \[ \lim_{u\to\infty}\frac{\gamma_{\vartheta}(u)}{u} = +\infty, \qquad \text{and}\qquad \lim_{u\to\infty}\frac{u\gamma'_{\vartheta}(u)}{\gamma_{\vartheta}(u)} = 2. \]

The first limit implies that the image of \(\gamma'_{\vartheta}\) is the real line and thus \(\overline{\mathrm{dom}\,\gamma^*_{\vartheta}}=(-\infty,+\infty)\). The expression for the conjugate is obtained by applying the Legendre-Fenchel transform to obtain \[ \gamma_{\vartheta}^*(u) = \begin{cases} a_{\vartheta}\upsilon^2 + b_{\vartheta}\upsilon + c_{\vartheta}, & \upsilon>\gamma'(u_{\vartheta}),\\ \newline \gamma^*(\upsilon), & u\leqslant \gamma'(u_{\vartheta}) \end{cases}, \]

where \(a_{\vartheta} = 1/(2\gamma''(u_{\vartheta}))\), \(b_{\vartheta}=u_{\vartheta} - 2a_{\vartheta}\gamma'(u_{\vartheta})\), and \(c_{\vartheta}=-\gamma(u_{\vartheta}) + a_{\vartheta}\gamma'(u_{\vartheta}) - u_{\vartheta}^2/a_{\vartheta}\). The conjugate \(\gamma_{\vartheta}^*(u)\) will have a closed form expression when so does the original divergence function.

Fully modified divergences

For some \(\vartheta>0\) and \(0 < \varphi < 1-a_{\gamma}\), let \(u_{\vartheta}\equiv 1+\vartheta\) and \(u_{\varphi} = a_{\gamma} + \varphi\). The fully modified divergence \(\gamma_{\varphi, \vartheta}\) is defined as \[ \gamma_{\vartheta}(u) = \begin{cases} \gamma(u_{\vartheta}) + \gamma'(u_{\vartheta})(u-u_{\vartheta}) + \frac{1}{2}\gamma''(u_{\vartheta})(u-u_{\vartheta})^2, & u\geqslant u_{\vartheta}\\ \newline\gamma(u), & u\in (u_{\varphi},u_{\vartheta})\\ \newline \gamma(u_{\varphi}) + \gamma'(u_{\varphi})(u-u_{\varphi}) + \frac{1}{2}\gamma''(u_{\varphi})(u-u_{\varphi})^2, & u\leqslant u_{\varphi}\\ \end{cases}. \] It is immediate to verify that this divergence still satisfies all the requirements and normalization of \(\gamma\), while being defined on all \(\mathbb{R}\).

Using Divergences package

using Divergences

Suppose \(a = [0.2, 0.4, 0.4]\) and \(b = [0.1, 0.3, 0.6]\).

a = [0.2, 0.4, 0.4]
b = [0.1, 0.3, 0.6]
3-element Vector{Float64}:
 0.1
 0.3
 0.6

We instantiate

KL = KullbackLeibler()
D = KL(a, b)
0.09151622184943564

To evaluate the gradient and the hessian

Divergences.gradient(KL, a, b)
3-element Vector{Float64}:
  0.6931471805599453
  0.287682072451781
 -0.4054651081081643
Divergences.hessian(KL, a, b)
3-element Vector{Float64}:
 5.0
 2.5
 2.5

Application to the Fully Modified (_):

For the piecewise function ((u)): 1. In each segment (e.g., for (u u) or (u u_)), apply the standard convex conjugate formula for the quadratic approximation. 2. For the middle segment (u (u_, u_)), the conjugate coincides with the conjugate of the base function ((u)) in that range.

The piecewise nature adds complexity, but the general approach remains consistent.