Information Theoretic Model Selection for Pattern Analysis

Exploratory data analysis requires (i) to define a set of patterns hypothesized to exist in the data, (ii) to specify a suitable quantification principle or cost function to rank these patterns and (iii) to validate the inferred patterns. For data clustering, the patterns are object partitionings into k groups; for PCA or truncated SVD, the patterns are orthogonal transformations with projections to a low-dimensional space. We propose an information theoretic principle for model selection and model-order selection. Our principle ranks competing pattern cost functions according to their ability to extract context sensitive information from noisy data with respect to the chosen hypothesis class. Sets of approximative solutions serve as a basis for a communication protocol. Analogous to Buhmann (2010), inferred models maximize the so-called approximation capacity that is the mutual information between coarsened training data patterns and coarsened test data patterns. We demonstrate how to apply our validation framework by the well-known Gaussian mixture model and by a multi-label clustering approach for role mining in binary user privilege assignments.

[1]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[2]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Sompolinsky,et al.  Scaling laws in learning of classification tasks. , 1993, Physical review letters.

[5]  Sompolinsky,et al.  Statistical mechanics of the maximum-likelihood density estimation. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[8]  David R. Anderson,et al.  Model Selection and Inference: A Practical Information-Theoretic Approach , 2001 .

[9]  David R. Anderson,et al.  Model selection and inference : a practical information-theoretic approach , 2000 .

[10]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[11]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[12]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[13]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[14]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[15]  William Bialek,et al.  How Many Clusters? An Information-Theoretic Perspective , 2003, Neural Computation.

[16]  Shai Ben-David,et al.  A Sober Look at Clustering Stability , 2006, COLT.

[17]  Michael Biehl,et al.  Phase transitions in vector quantization and neural gas , 2009, Neurocomputing.

[18]  Joachim M. Buhmann,et al.  Multi-assignment clustering for Boolean data , 2009, ICML '09.

[19]  Joachim M. Buhmann Information theoretic model validation for clustering , 2010, 2010 IEEE International Symposium on Information Theory.

[20]  Joachim M. Buhmann,et al.  Selecting the rank of SVD by Maximum Approximation Capacity , 2011, ArXiv.

[21]  Joachim M. Buhmann,et al.  The Minimum Transfer Cost Principle for Model-Order Selection , 2011, ECML/PKDD.

[22]  Joachim M. Buhmann Context Sensitive Information: Model Validation by Information Theory , 2012, ICPRAM.

[23]  Joachim M. Buhmann,et al.  The information content in sorting algorithms , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[24]  H. Seung,et al.  Scaling Laws in Learning of Classification Tasks 17 MAY 1993 , .