Advances in Minimum Description Length: Theory and Applications

The process of inductive inference -- to infer general laws and principles from particular instances -- is the basis of statistical modeling, pattern recognition, and machine learning. The Minimum Descriptive Length (MDL) principle, a powerful method of inductive inference, holds that the best explanation, given a limited set of observed data, is the one that permits the greatest compression of the data -- that the more we are able to compress the data, the more we learn about the regularities underlying the data. Advances in Minimum Description Length is a sourcebook that will introduce the scientific community to the foundations of MDL, recent theoretical advances, and practical applications.The book begins with an extensive tutorial on MDL, covering its theoretical underpinnings, practical implications as well as its various interpretations, and its underlying philosophy. The tutorial includes a brief history of MDL -- from its roots in the notion of Kolmogorov complexity to the beginning of MDL proper. The book then presents recent theoretical advances, introducing modern MDL methods in a way that is accessible to readers from many different scientific fields. The book concludes with examples of how to apply MDL in research settings that range from bioinformatics and machine learning to psychology.

[1]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[2]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[3]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[4]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[7]  A. Cohen,et al.  Finite Mixture Distributions , 1982 .

[8]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[9]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[10]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[11]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[12]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[13]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[14]  李幼升,et al.  Ph , 1989 .

[15]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[18]  Anil K. Jain,et al.  A self-organizing network for hyperellipsoidal clustering (HEC) , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[19]  Jorma Rissanen,et al.  Unsupervised Classification with Stochastic Complexity , 1994 .

[20]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[21]  Yishay Mansour,et al.  An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering , 1997, UAI.

[22]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[23]  Peter Gr Unwald The minimum description length principle and reasoning under uncertainty , 1998 .

[24]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[25]  Michael I. Jordan Graphical Models , 2003 .

[26]  Padhraic Smyth,et al.  Probabilistic Model-Based Clustering of Multivariate and Sequential Data , 1999 .

[27]  Alexander Shen,et al.  Discussion on Kolmogorov Complexity and Statistical Analysis , 1999, Comput. J..

[28]  Jorma Rissanen,et al.  Hypothesis Selection and Testing by the MDL Principle , 1999, Comput. J..

[29]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[30]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[31]  Andrew R. Barron,et al.  Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.

[32]  Henry Tirri,et al.  On predictive distributions and Bayesian networks , 2000, Stat. Comput..

[33]  Henry Tirri,et al.  Supervised model-based visualization of high-dimensional data , 2000, Intell. Data Anal..

[34]  Ming Li,et al.  Minimum description length induction, Bayesianism, and Kolmogorov complexity , 1999, IEEE Trans. Inf. Theory.

[35]  Péter Gács,et al.  Algorithmic statistics , 2000, IEEE Trans. Inf. Theory.

[36]  Jorma Rissanen,et al.  Strong optimality of the normalized ML models as universal codes and information in data , 2001, IEEE Trans. Inf. Theory.

[37]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[38]  Naftali Tishby,et al.  Unsupervised document classification using sequential information maximization , 2002, SIGIR '02.

[39]  Byron Dom,et al.  An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.

[40]  Gerhard Widmer,et al.  Towards a Simple Clustering Criterion Based on Minimum Length Encoding , 2002, ECML.

[41]  Mark D. Plumbley Clustering of Sparse Binary Data using a Minimum Description Length Approach , 2002 .

[42]  Nikolai K. Vereshchagin,et al.  Kolmogorov's structure functions with an application to the foundations of model selection , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[43]  Jorma Rissanen Kolmogorov's structure function for probability models , 2002, Proceedings of the IEEE Information Theory Workshop.

[44]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Jorma Rissanen,et al.  Efficient Computation of Stochastic Complexity , 2003 .

[46]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.