Méthodes de géométrie de l'information pour les modèles de mélange. (Information-geometric methods for mixture models)

Cette these presente de nouvelles methodes pour l'apprentissage de modeles de melanges basees sur la geometrie de l'information. Les modeles de melanges consideres ici sont des melanges de familles exponentielles, permettant ainsi d'englober une large part des modeles de melanges utilises en pratique. Grâce a la geometrie de l'information, les problemes statistiques peuvent etre traites avec des outils geometriques. Ce cadre offre de nouvelles perspectives permettant de mettre au point des algorithmes a la fois rapides et generiques. Deux contributions principales sont proposees ici. La premiere est une methode de simplification d'estimateurs par noyaux. Cette simplification est effectuee a l'aide un algorithme de partitionnement, d'abord avec la divergence de Bregman puis, pour des raisons de rapidite, avec la distance de Fisher-Rao et des barycentres modeles. La seconde contribution est une generalisation de l'algorithme k-MLE permettant de traiter des melanges ou toutes les composantes ne font pas partie de la meme famille: cette methode est appliquee au cas des melanges de Gaussiennes generalisees et des melanges de lois Gamma et est plus rapide que les methodes existantes. La description de ces deux methodes est accompagnee d'une implementation logicielle complete et leur efficacite est evaluee grâce a des applications en bio-informatique et en classification de textures.

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  Darlene R Goldstein,et al.  A Laplace mixture model for identification of differential expression in microarray experiments. , 2006, Biostatistics.

[3]  Frank Nielsen,et al.  Learning Mixtures by Simplifying Kernel Density Estimators , 2013 .

[4]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[5]  E. Pitman,et al.  Sufficient statistics and intrinsic accuracy , 1936, Mathematical Proceedings of the Cambridge Philosophical Society.

[6]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[7]  Yali Amit,et al.  Generative Models for Labeling Multi-object Configurations in Images , 2006, Toward Category-Level Object Recognition.

[8]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Yannick Berthoumieu,et al.  K-centroids-based supervised classification of texture images: Handling the intra-class diversity , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  B. O. Koopman On distributions admitting a sufficient statistic , 1936 .

[11]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[12]  Felix . Klein,et al.  Vergleichende Betrachtungen über neuere geometrische Forschungen , 1893 .

[13]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[14]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[15]  Frank Nielsen,et al.  Levels of Details for Gaussian Mixture Models , 2009, ACCV.

[16]  Arnaud Dessein,et al.  Computational Methods of Information Geometry with Real-Time Applications in Audio Signal Processing. (Méthodes Computationnelles en Géométrie de l'Information et Applications Temps Réel au Traitement du Signal Audio) , 2012 .

[17]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[18]  T. Morimoto Markov Processes and the H -Theorem , 1963 .

[19]  Sueli I. Rodrigues Costa,et al.  Fisher information matrix and hyperbolic geometry , 2005, IEEE Information Theory Workshop, 2005..

[20]  Frank Nielsen,et al.  Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[21]  R. Kass,et al.  Geometrical Foundations of Asymptotic Inference , 1997 .

[22]  Frank Nielsen,et al.  Matrix Information Geometry , 2012 .

[23]  Hông Vân Lê,et al.  The uniqueness of the Fisher metric as information metric , 2013, 1306.1465.

[24]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[25]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[26]  G. Gal'perin A concept of the mass center of a system of material points in the constant curvature spaces , 1993 .

[27]  Frank Nielsen,et al.  k-MLE for mixtures of generalized Gaussians , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[28]  Alexey Koloydenko,et al.  On Adjusted Viterbi Training , 2007 .

[29]  Sanjoy Dasgupta,et al.  Agglomerative Bregman Clustering , 2012, ICML.

[30]  Frank Nielsen,et al.  Bregman Voronoi Diagrams , 2007, Discret. Comput. Geom..

[31]  José Carlos Príncipe,et al.  Closed-form cauchy-schwarz PDF divergence for mixture of Gaussians , 2011, The 2011 International Joint Conference on Neural Networks.

[32]  Itay Mayrose,et al.  A Gamma mixture model better accounts for among site rate heterogeneity , 2005, ECCB/JBI.

[33]  L. R. Haff,et al.  Minimax estimation for mixtures of Wishart distributions , 2011, 1203.3342.

[34]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[35]  Paolo Piro,et al.  Learning prototype-based classification rules in a boosting framework: application to real-world and medical image categorization , 2010 .

[36]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[37]  Robert Jenssen,et al.  The Cauchy-Schwarz divergence and Parzen windowing: Connections to graph theory and Mercer kernels , 2006, J. Frankl. Inst..

[38]  Richard Nock,et al.  On Bregman Voronoi diagrams , 2007, SODA '07.

[39]  Michèle Basseville,et al.  Divergence measures for statistical data processing - An annotated bibliography , 2013, Signal Process..

[40]  Jalal Almhana,et al.  A Recursive Algorithm for Gamma Mixture Models , 2006, 2006 IEEE International Conference on Communications.

[41]  Mitio Nagumo Über eine Klasse der Mittelwerte , 1930 .

[42]  R. Keener Curved Exponential Families , 2009 .

[43]  Frank Nielsen,et al.  Non-flat clustering with alpha-divergences , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Frank Nielsen,et al.  The Burbea-Rao and Bhattacharyya Centroids , 2010, IEEE Transactions on Information Theory.

[45]  Hongyuan Zha,et al.  Computational Statistics Data Analysis , 2021 .

[46]  C. R. Rao,et al.  On the convexity of some divergence measures based on entropy functions , 1982, IEEE Trans. Inf. Theory.

[47]  David Beymer,et al.  Closed-Form Jensen-Renyi Divergence for Mixture of Gaussians and Applications to Group-Wise Shape Registration , 2009, MICCAI.

[48]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[49]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .

[50]  Frank Nielsen,et al.  A New Implementation of k-MLE for Mixture Modeling of Wishart Distributions , 2013, GSI.

[51]  Olivier Schwander,et al.  Evaluating Mixture Models for Building RNA Knowledge-Based Potentials , 2012, J. Bioinform. Comput. Biol..

[52]  J. Aczél On mean values , 1948 .

[53]  Jean-Philippe Thiran,et al.  Lower and upper bounds for approximation of the Kullback-Leibler divergence between Gaussian Mixture Models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54]  Frank Nielsen,et al.  Hyperbolic Voronoi Diagrams Made Easy , 2009, 2010 International Conference on Computational Science and Its Applications.

[55]  Frank Nielsen,et al.  Model centroids for the simplification of Kernel Density estimators , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[56]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[57]  Mohand Saïd Allili,et al.  Wavelet-Based Texture Retrieval Using a Mixture of Generalized Gaussian Distributions , 2010, 2010 20th International Conference on Pattern Recognition.

[58]  Sergio Cruces,et al.  Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization , 2011, Entropy.

[59]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[60]  L. Brown Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .

[61]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[62]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[63]  Gérard Govaert,et al.  Model-based cluster and discriminant analysis with the MIXMOD software , 2006, Comput. Stat. Data Anal..

[64]  R. Kass The Geometry of Asymptotic Inference , 1989 .

[65]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[66]  Minh N. Do,et al.  Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance , 2002, IEEE Trans. Image Process..

[67]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[68]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[69]  Frank Nielsen,et al.  Bhattacharyya Clustering with Applications to Mixture Simplifications , 2010, 2010 20th International Conference on Pattern Recognition.

[70]  Bruno Pelletier,et al.  Informative barycentres in statistics , 2005 .

[71]  Chong-Sze Tong,et al.  Supervised Texture Classification Using Characteristic Generalized Gaussian Density , 2007, Journal of Mathematical Imaging and Vision.

[72]  H. Cramér Mathematical methods of statistics , 1947 .

[73]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[74]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[75]  M. Rao,et al.  Metrics defined by Bregman Divergences , 2008 .

[76]  Frank Nielsen,et al.  Shape Retrieval Using Hierarchical Total Bregman Soft Clustering , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[78]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[79]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[80]  Petia Radeva,et al.  Rayleigh Mixture Model for Plaque Characterization in Intravascular Ultrasound , 2011, IEEE Transactions on Biomedical Engineering.

[81]  E. Hellinger,et al.  Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. , 1909 .

[82]  Nizar Bouguila,et al.  Image and Video Segmentation by Combining Unsupervised Generalized Gaussian Mixture Modeling and Feature Selection , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[83]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[84]  Adelene Y. L. Sim,et al.  Fully differentiable coarse-grained and all-atom knowledge-based potentials for RNA structure evaluation. , 2011, RNA.

[85]  S. Nadarajah A generalized normal distribution , 2005 .

[86]  Frank Nielsen,et al.  Closed-form information-theoretic divergences for statistical mixtures , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[87]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[88]  John D. Lafferty,et al.  Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..

[89]  Frank Nielsen,et al.  K-MLE: A fast algorithm for learning statistical mixture models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[90]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[91]  R. Fisher Two New Properties of Mathematical Likelihood , 1934 .

[92]  Xiaohu Guo,et al.  Hyperbolic centroidal Voronoi tessellation , 2010, SPM '10.

[93]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[94]  Benjamin Georgi,et al.  PyMix - The Python mixture package - a tool for clustering of heterogeneous biological data , 2010, BMC Bioinformatics.

[95]  Frank Nielsen,et al.  Bregman vantage point trees for efficient nearest Neighbor Queries , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[96]  Josep M. Oller,et al.  Computing the Rao distance for gamma distributions , 2003 .

[97]  M. Fréchet Sur l'extension de certaines evaluations statistiques au cas de petits echantillons , 1943 .

[98]  Malgorzata Bogdan,et al.  On Existence of Maximum Likelihood Estimators in Exponential Families , 2000 .

[99]  Frank Nielsen,et al.  PyMEF — A framework for exponential families in Python , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[100]  K. Matusita Decision Rules, Based on the Distance, for Problems of Fit, Two Samples, and Estimation , 1955 .

[101]  Richard M. Karp,et al.  Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[102]  S. Amari Differential Geometry of Curved Exponential Families-Curvatures and Information Loss , 1982 .

[103]  J. Marron,et al.  Smoothed cross-validation , 1992 .

[104]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[105]  L. Cobb The Multimodal Exponential Families of Statistical Catastrophe Theory , 1981 .

[106]  Frank Nielsen,et al.  Fast Learning of Gamma Mixture Models with k-MLE , 2013, SIMBAD.

[107]  Erling B. Andersen,et al.  Sufficiency and Exponential Families for Discrete Sample Spaces , 1970 .

[108]  Giovanni Parmigiani,et al.  GAMMA SHAPE MIXTURES FOR HEAVY-TAILED DISTRIBUTIONS , 2008, 0807.4663.

[109]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[110]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[111]  Ling Guan,et al.  Application of Laplacian Mixture Model to Image and Video Retrieval , 2007, IEEE Transactions on Multimedia.

[112]  Brian Gough,et al.  GNU Scientific Library Reference Manual - Third Edition , 2003 .

[113]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[114]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.