Linear Time Model Selection for Mixture of Heterogeneous Components

Our main contribution is to propose a novel model selection methodology, expectation minimization of description length (EMDL), based on the minimum description length (MDL) principle. EMDL makes a significant impact on the combinatorial scalability issue pertaining to the model selection for mixture models having types of components. A goal of such problems is to optimize types of components as well as the number of components. One key idea in EMDL is to iterate calculations of the posterior of latent variables and minimization of expected description length of both observed data and latent variables. This enables EMDL to compute the optimal model in linear time with respect to both the number of components and the number of available types of components despite the fact that the number of model candidates exponentially increases with the numbers. We prove that EMDL is compliant with the MDL principle and enjoys its statistical benefits.

[1]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[2]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  Kristin P. Bennett,et al.  A Pattern Search Method for Model Selection of Support Vector Regression , 2002, SDM.

[5]  Xindong Wu,et al.  Research and Development in Knowledge Discovery and Data Mining , 1998, Lecture Notes in Computer Science.

[6]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[7]  Mineichi Kudo,et al.  MDL-Based Selection of the Number of Components in Mixture Models for Pattern Classification , 1998, SSPR/SPR.

[8]  Simon Parsons Advances in minimum description length by Jae Myung and Mark A. Pitt, edited by Peter D. Grünwald, MIT Press, 444 pp, ISBN 0-262-07262-9 , 2006, Knowl. Eng. Rev..

[9]  Virginia Torczon,et al.  DERIVATIVE-FREE PATTERN SEARCH METHODS FOR MULTIDISCIPLINARY DESIGN PROBLEMS , 1994 .

[10]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[11]  E. Wegman Nonparametric probability density estimation , 1972 .

[12]  C. S. Wallace,et al.  MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions , 1997 .

[13]  John Wright,et al.  Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Kenji Yamanishi,et al.  A learning criterion for stochastic rules , 1990, COLT '90.

[15]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[16]  Ke Wang,et al.  Proceedings of the SIAM International Conference on Data Mining, SDM 2009, April 30 - May 2, 2009, Sparks, Nevada, USA , 2009, SDM.

[17]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[18]  R. Casey,et al.  Advances in Pattern Recognition , 1971 .

[19]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  Naonori Ueda,et al.  EM algorithm with split and merge operations for mixture models , 2000 .

[23]  David L. Dowe,et al.  Single Factor Analysis in MML Mixture Modelling , 1998, PAKDD.

[24]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[25]  Mark A. Pitt,et al.  Advances in Minimum Description Length: Theory and Applications , 2005 .