Sifting Common Information from Many Variables

Measuring the relationship between any pair of variables is a rich and active area of research that is central to scientific practice. In contrast, characterizing the common information among any group of variables is typically a theoretical exercise with few practical methods for high-dimensional data. A promising solution would be a multivariate generalization of the famous Wyner common information, but this approach relies on solving an apparently intractable optimization problem. We leverage the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables. This scalable approach allows us to demonstrate the usefulness of common information in high-dimensional learning problems. The sieve outperforms standard methods on dimensionality reduction tasks, solves a blind source separation problem that cannot be solved with ICA, and accurately recovers structure in brain imaging data.

[1]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[2]  Gal Chechik,et al.  Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..

[3]  James P. Crutchfield,et al.  Intersection Information Based on Common Randomness , 2013, Entropy.

[4]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[5]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[6]  Juliana Freire,et al.  Proceedings of the 19th international conference on World wide web , 2010, WWW 2010.

[7]  J. Bunch,et al.  Rank-one modification of the symmetric eigenproblem , 1978 .

[8]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[9]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Michael Gastpar,et al.  Total correlation of Gaussian vector sources on the Gray-Wyner network , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[11]  Georg M. Goerg The Lambert Way to Gaussianize Heavy-Tailed Data with the Inverse of Tukey's h Transformation as a Special Case , 2010, TheScientificWorldJournal.

[12]  Raymond B. Cattell,et al.  Factor Analysis. An Introduction and Manual for the Psychologist and Social Scientist. , 1953 .

[13]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[14]  Wei Liu,et al.  Wyner's Common Information: Generalizations and A New Lossy Source Coding Interpretation , 2013, ArXiv.

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Alexander Kraskov,et al.  Least-dependent-component analysis based on mutual information. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  R. Fildes Journal of the American Statistical Association : William S. Cleveland, Marylyn E. McGill and Robert McGill, The shape parameter for a two variable graph 83 (1988) 289-300 , 1989 .

[19]  R. J. Joenk,et al.  IBM journal of research and development: information for authors , 1978 .

[20]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[21]  Abbas El Gamal,et al.  Exact common information , 2014, 2014 IEEE International Symposium on Information Theory.

[22]  Peter Grassberger,et al.  Lower bounds on mutual information. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  R. Cattell,et al.  Factor Analysis: An Introduction and Manual for the Psychologist and Social Scientist , 1953 .

[24]  Pierre Baldi,et al.  The Ebb and Flow of Deep Learning: a Theory of Local Learning , 2015, ArXiv.

[25]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[26]  Aram Galstyan,et al.  The Information Sieve , 2015, ICML.

[27]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[28]  Aram Galstyan,et al.  Maximally Informative Hierarchical Representations of High-Dimensional Data , 2014, AISTATS.

[29]  Michael Gastpar,et al.  Caching Gaussians: Minimizing total correlation on the Gray-Wyner network , 2016, 2016 Annual Conference on Information Science and Systems (CISS).

[30]  Aram Galstyan,et al.  Discovering Structure in High-Dimensional Data Through Correlation Explanation , 2014, NIPS.

[31]  Aaron D. Wyner,et al.  The common information of two dependent random variables , 1975, IEEE Trans. Inf. Theory.

[32]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[33]  Aram Galstyan,et al.  Low Complexity Gaussian Latent Factor Models and a Blessing of Dimensionality , 2017, ArXiv.

[34]  Vince D. Calhoun,et al.  Capturing inter-subject variability with group independent component analysis of fMRI data: A simulation study , 2012, NeuroImage.

[35]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[36]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[37]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[38]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[39]  Vince D. Calhoun,et al.  SimTB, a simulation toolbox for fMRI data under a model of spatiotemporal separability , 2012, NeuroImage.

[40]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[41]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[42]  S. Klinke,et al.  Exploratory Projection Pursuit , 1995 .

[43]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[44]  Barnabás Póczos,et al.  Nonparanormal Information Estimation , 2017, ICML.

[45]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[46]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[47]  Michael I. Jordan,et al.  Unsupervised Kernel Dimension Reduction , 2010, NIPS.