The Information Sieve

We introduce a new framework for unsupervised learning of representations based on a novel hierarchical decomposition of information. Intuitively, data is passed through a series of progressively fine-grained sieves. Each layer of the sieve recovers a single latent factor that is maximally informative about multivariate dependence in the data. The data is transformed after each pass so that the remaining unexplained information trickles down to the next layer. Ultimately, we are left with a set of latent factors explaining all the dependence in the original data and remainder information consisting of independent noise. We present a practical implementation of this framework for discrete variables and apply it to a variety of fundamental tasks in unsupervised learning including independent component analysis, lossy and lossless compression, and predicting missing values in data.

[1]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[2]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[3]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[4]  Aram Galstyan,et al.  Sifting Common Information from Many Variables , 2016, IJCAI.

[5]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[6]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[7]  Alexander Kraskov,et al.  Least-dependent-component analysis based on mutual information. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[9]  Alexander Kraskov,et al.  Published under the scientific responsability of the EUROPEAN PHYSICAL SOCIETY Incorporating , 2002 .

[10]  S. Pepke,et al.  Comprehensive discovery of subsample gene expression components by information explanation: therapeutic implications in cancer , 2016 .

[11]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[12]  S. Pepke,et al.  Multivariate information maximization yields hierarchies of expression components in tumors that are both biologically meaningful and prognostic , 2016 .

[13]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[14]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[16]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[17]  Nihat Ay,et al.  Information-theoretic inference of common ancestors , 2010, Entropy.

[18]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[19]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[20]  Meir Feder,et al.  Generalized binary independent component analysis , 2014, 2014 IEEE International Symposium on Information Theory.

[21]  Paul M. Thompson,et al.  Relative value of diverse brain MRI and blood-based biomarkers for predicting cognitive decline in the elderly , 2016, SPIE Medical Imaging.

[22]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[23]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[24]  Shun-ichi Amari,et al.  Information geometry on hierarchy of probability distributions , 2001, IEEE Trans. Inf. Theory.

[25]  Yann LeCun,et al.  Learning Representations by Maximizing Compression , 2011, ArXiv.

[26]  Randall D. Beer,et al.  Nonnegative Decomposition of Multivariate Information , 2010, ArXiv.

[27]  Jürgen Schmidhuber,et al.  Feature Extraction Through LOCOCODE , 1999, Neural Computation.

[28]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[29]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[30]  Aram Galstyan,et al.  Discovering Structure in High-Dimensional Data Through Correlation Explanation , 2014, NIPS.

[31]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[32]  Fei Sha,et al.  Demystifying Information-Theoretic Clustering , 2013, ICML.

[33]  Aram Galstyan,et al.  Maximally Informative Hierarchical Representations of High-Dimensional Data , 2014, AISTATS.