Pattern Learning and Recognition on Statistical Manifolds: An Information-Geometric Review

We review the information-geometric framework for statistical pattern recognition: First, we explain the role of statistical similarity measures and distances in fundamental statistical pattern recognition problems. We then concisely review the main statistical distances and report a novel versatile family of divergences. Depending on their intrinsic complexity, the statistical patterns are learned by either atomic parametric distributions, semi-parametric finite mixtures, or non-parametric kernel density distributions. Those statistical patterns are interpreted and handled geometrically in statistical manifolds either as single points, weighted sparse point sets or non-weighted dense point sets. We explain the construction of the two prominent families of statistical manifolds: The Rao Riemannian manifolds with geodesic metric distances, and the Amari-Chentsov manifolds with dual asymmetric non-metric divergences. For the latter manifolds, when considering atomic distributions from the same exponential families (including the ubiquitous Gaussian and multinomial families), we end up with dually flat exponential family manifolds that play a crucial role in many applications. We compare the advantages and disadvantages of these two approaches from the algorithmic point of view. Finally, we conclude with further perspectives on how "geometric thinking" may spur novel pattern modeling and processing paradigms.

[1]  Xavier Pennec,et al.  Statistical Computing on Manifolds: From Riemannian Geometry to Computational Anatomy , 2009, ETVC.

[2]  Frank Nielsen,et al.  Visualizing bregman voronoi diagrams , 2007, SCG '07.

[3]  Frank Nielsen,et al.  PyMEF — A framework for exponential families in Python , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[4]  Frank Nielsen,et al.  Fast Learning of Gamma Mixture Models with k-MLE , 2013, SIMBAD.

[5]  Richard Nock,et al.  On Bregman Voronoi diagrams , 2007, SODA '07.

[6]  Frank Nielsen,et al.  Model centroids for the simplification of Kernel Density estimators , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Martin E. Hellman,et al.  Probability of error, equivocation, and the Chernoff bound , 1970, IEEE Trans. Inf. Theory.

[8]  Frank Nielsen,et al.  On the Smallest Enclosing Information Disk , 2008, CCCG.

[9]  Frank Nielsen,et al.  Cramer-Rao Lower Bound and Information Geometry , 2013, ArXiv.

[10]  Frank Nielsen,et al.  Jeffreys Centroids: A Closed-Form Expression for Positive Histograms and a Guaranteed Tight Approximation for Frequency Histograms , 2013, IEEE Signal Processing Letters.

[11]  Frank Nielsen,et al.  Jensen-Bregman Voronoi Diagrams and Centroidal Tessellations , 2010, 2010 International Symposium on Voronoi Diagrams in Science and Engineering.

[12]  Johannes Blömer,et al.  Bregman Clustering for Separable Instances , 2010, SWAT.

[13]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[14]  Frank Nielsen,et al.  On approximating the smallest enclosing Bregman Balls , 2006, SCG '06.

[15]  M. Grasselli,et al.  On the Uniqueness of the Chentsov Metric in Quantum Information Geometry , 2000, math-ph/0006030.

[16]  Jonathan Richard Shewchuk,et al.  Anisotropic voronoi diagrams and guaranteed-quality anisotropic mesh generation , 2003, SCG '03.

[17]  Frank Nielsen,et al.  Bregman vantage point trees for efficient nearest Neighbor Queries , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[18]  A. Dawid The geometry of proper scoring rules , 2007 .

[19]  Frank Nielsen,et al.  Fitting the Smallest Enclosing Bregman Ball , 2005, ECML.

[20]  Shun-ichi Amari,et al.  $\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes , 2009, IEEE Transactions on Information Theory.

[21]  Frank Nielsen,et al.  The Burbea-Rao and Bhattacharyya Centroids , 2010, IEEE Transactions on Information Theory.

[22]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Frank Nielsen,et al.  Simplification and hierarchical representations of mixtures of exponential families , 2010 .

[24]  Frank Nielsen,et al.  Matrix Information Geometry , 2012 .

[25]  Frank Nielsen,et al.  On approximating the Riemannian 1-center , 2011, Comput. Geom..

[26]  Frank Nielsen,et al.  The hyperbolic Voronoi diagram in arbitrary dimension , 2012, ArXiv.

[27]  Frank Nielsen,et al.  Shape Retrieval Using Hierarchical Total Bregman Soft Clustering , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Frank Nielsen,et al.  The Dual Voronoi Diagrams with Respect to Representational Bregman Divergences , 2009, 2009 Sixth International Symposium on Voronoi Diagrams.

[29]  Frank Nielsen,et al.  A closed-form expression for the Sharma–Mittal entropy of exponential families , 2011, ArXiv.

[30]  N. H. Beebe A Complete Bibliography of the LMS Journal of Computation and Mathematics , 2015 .

[31]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[32]  M. Fréchet Sur l'extension de certaines evaluations statistiques au cas de petits echantillons , 1943 .

[33]  Giovanni Pistone,et al.  Exponential statistical manifold , 2007 .

[34]  Frank Nielsen,et al.  k-MLE for mixtures of generalized Gaussians , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[35]  Malgorzata Bogdan,et al.  On Existence of Maximum Likelihood Estimators in Exponential Families , 2000 .

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  W. Gangbo,et al.  The geometry of optimal transportation , 1996 .

[38]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[39]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[40]  Frank Nielsen,et al.  Jensen divergence based SPD matrix means and applications , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[41]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[42]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[43]  Frank Nielsen,et al.  Entropies and cross-entropies of exponential families , 2010, 2010 IEEE International Conference on Image Processing.

[44]  Zhongmin Shen,et al.  Riemann-Finsler Geometry with Applications to Information Geometry , 2006 .

[45]  N. Chentsov,et al.  Markov invariant geometry on manifolds of states , 1991 .

[46]  Thomas Hofmann,et al.  Exponential Families for Conditional Random Fields , 2004, UAI.

[47]  Frank Nielsen,et al.  A family of statistical symmetric divergences based on Jensen's inequality , 2010, ArXiv.

[48]  Rajendra Bhatia,et al.  Connected at Infinity: A Selection of Mathematics by Indians , 2003 .

[49]  Frank Nielsen,et al.  Total Bregman divergence and its applications to shape retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  J. M. Corcuera,et al.  A Characterization of Monotone and Regular Divergences , 1998 .

[51]  Frank Nielsen,et al.  Medians and means in Finsler geometry , 2010, LMS J. Comput. Math..

[52]  H. Cramér Mathematical Methods of Statistics (PMS-9), Volume 9 , 1946 .

[53]  Frank Nielsen,et al.  Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[54]  Kathryn B. Laskey,et al.  Uncertainty in Artificial Intelligence 15 , 1999 .

[55]  Frank Nielsen,et al.  Hypothesis Testing, Information Divergence and Computational Geometry , 2013, GSI.

[56]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[57]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[58]  Anand Rangarajan,et al.  A New Closed-Form Information Metric for Shape Analysis , 2006, MICCAI.

[59]  C. Atkinson Rao's distance measure , 1981 .

[60]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Frank Nielsen,et al.  An Information-Geometric Characterization of Chernoff Information , 2013, IEEE Signal Processing Letters.

[62]  Frank Nielsen,et al.  Hyperbolic Voronoi Diagrams Made Easy , 2009, 2010 International Conference on Computational Science and Its Applications.

[63]  Frank Nielsen,et al.  Levels of Details for Gaussian Mixture Models , 2009, ACCV.

[64]  Frank Nielsen,et al.  Skew Jensen-Bregman Voronoi Diagrams , 2011, Trans. Comput. Sci..

[65]  Petia Radeva,et al.  Rayleigh Mixture Model for Plaque Characterization in Intravascular Ultrasound , 2011, IEEE Transactions on Biomedical Engineering.

[66]  Frank Nielsen,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1 Total Bregman Divergence and its Applications to DTI Analysis , 2022 .

[67]  Nobuaki Minematsu,et al.  A Study on Invariance of $f$-Divergence and Its Application to Speech Recognition , 2010, IEEE Transactions on Signal Processing.

[68]  Frank Nielsen Visual computing : geometry, graphics, and vision , 2005 .

[69]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[70]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[71]  Miroslav Lovric,et al.  Multivariate Normal Distributions Parametrized as a Riemannian Symmetric Space , 2000 .

[72]  Igor Vajda,et al.  About distances of discrete distributions satisfying the data processing theorem of information theory , 1997, IEEE Trans. Inf. Theory.

[73]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[74]  Frank Nielsen,et al.  Tailored Bregman Ball Trees for Effective Nearest Neighbors , 2009 .

[75]  Frank Nielsen,et al.  Fitting the smallest enclosing Bregman balls , 2005 .

[76]  Frank Nielsen Emerging Trends in Visual Computing, LIX Fall Colloquium, ETVC 2008, Palaiseau, France, November 18-20, 2008. Revised Invited Papers , 2009, etvc.

[77]  Frank Nielsen,et al.  A New Implementation of k-MLE for Mixture Modeling of Wishart Distributions , 2013, GSI.

[78]  Frédéric Barbaresco,et al.  Interactions between Symmetric Cone and Information Geometries: Bruhat-Tits and Siegel Spaces Models for High Resolution Autoregressive Doppler Imagery , 2009, ETVC.

[79]  Frank Nielsen,et al.  Closed-form information-theoretic divergences for statistical mixtures , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[80]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[81]  Frank Nielsen,et al.  K-MLE: A fast algorithm for learning statistical mixture models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[82]  Leonidas GUIBAS Emerging Trends in Visual Computing (ETVC’08) , 2008 .

[83]  Frank Nielsen,et al.  A closed-form expression for the Sharma–Mittal entropy of exponential families , 2011, ArXiv.

[84]  Frank Nielsen,et al.  Approximating Smallest Enclosing Balls with Applications to Machine Learning , 2009, Int. J. Comput. Geom. Appl..

[85]  Frank Nielsen,et al.  Learning Mixtures by Simplifying Kernel Density Estimators , 2013 .

[86]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .