Q - Divergence

Introduction
The most commonly used result in Information Geometry is the so-called i-projection story. If we have a distribution $$ r_{\sf y}(\cdot) $$, and an exponential family $$ {\cal E} = \{p(\cdot; x)\} $$ with parameter $$ x $$ and natural statistic $$ t(\cdot) $$, i.e.

$$ p_{\sf y}(y; x) = \exp ( x \cdot t(y) -\alpha(x) + \beta(y) ) $$

then if we do maximum likelihood fit of $$ r_{\sf y}(\cdot) $$:

$$ x^* = \arg\min_x D_{\rm KL} ( r_{\sf y}|(\cdot) || p_{\sf y}(\cdot; x)) $$

Then we would have $$ {\mathbb E}_{r_{\sf y}}[t({\sf y})] = {\mathbb E}_{p_{\sf y}(\cdot; x^*)} [t({\sf y})] $$. We say that $$ r_{\sf y} $$ and $$ p_{\sf y} (\cdot; x^*) $$ belong to the same linear family (some call that "mixture family"). Moreover, there is a nice Pythagorean identity for this: for any $$ p_{\sf y} (\cdot ; x) \in {\cal E} $$,

$$ D_{\rm KL} (r_{\sf y} || p_{\sf y}(\cdot; x) ) = D_{\rm KL} (r_{\sf y} || p_{\sf y}(\cdot; x^*)) + D_{\rm KL} (p_{\sf y}(\cdot; x^*) || p_{\sf y}(\cdot; x)) $$

This nice result allows us to visualize the exponential family as "orthogonal to" the linear family, and talk about maximum likelihood estimation in the language of i-projection and m-projection. Such a geometric view is extremely powerful. For example it allows us to understand the EM algorithm and the Blahut-Arimoto algorithm as alternating projections.

The reason that I write this page is because there is in fact a lot more on this story. As a part of Amari's work, he defined a notion of $$\alpha $$ divergence and q-exponential family, which generalized the above picture. The best survey I found was in this paper titled "Geometry of q-Exponential Family of Probability Distributions". What I hope to do here is to present a simpler version of this paper, where we focus on discrete finite alphabets, and 1-dimensional families. By doing that, I hope to emphasize on the key conceptual steps and discuss where can we possibly use such strong tools.