Relevant sparse codes with variational information bottleneck

In many applications, it is desirable to extract only the relevant aspects of data. A principled way to do this is the information bottleneck (IB) method, where one seeks a code that maximizes information about a 'relevance' variable, Y, while constraining the information encoded about the original data, X. Unfortunately however, the IB method is computationally demanding when data are high-dimensional and/or non-gaussian. Here we propose an approximate variational scheme for maximizing a lower bound on the IB objective, analogous to variational EM. Using this method, we derive an IB algorithm to recover features that are both relevant and sparse. Finally, we demonstrate how kernelized versions of the algorithm can be used to address a broad range of problems with non-linear relation between X and Y.

[1]  M. West On scale mixtures of normal distributions , 1987 .

[2]  Gunnar Rätsch,et al.  Invariant Feature Extraction and Classification in Kernel Spaces , 1999, NIPS.

[3]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[4]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[5]  A. U.S.,et al.  Predictability , Complexity , and Learning , 2002 .

[6]  Noam Slonim,et al.  Maximum Likelihood and the Information Bottleneck , 2002, NIPS.

[7]  Gal Chechik,et al.  Extracting Relevant Structures with Side Information , 2002, NIPS.

[8]  Gal Chechik,et al.  Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..

[9]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[10]  Nir Friedman,et al.  The Information Bottleneck EM Algorithm , 2002, UAI.

[11]  Michael S. Lewicki,et al.  Sparse Coding of Natural Images Using an Overcomplete Set of Limited Capacity Units , 2004, NIPS.

[12]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[13]  Naftali Tishby,et al.  Efficient representation as a design principle for neural coding and computation , 2006, 2006 IEEE International Symposium on Information Theory.

[14]  Noam Slonim,et al.  The Information Bottleneck : Theory and Applications , 2006 .

[15]  Thomas Hofmann,et al.  Conditional Information Bottleneck Clustering , 2008 .

[16]  Matthias Bethge,et al.  Natural Image Coding in V1: How Much Use Is Orientation Selectivity? , 2008, PLoS Comput. Biol..

[17]  Gasper Tkacik,et al.  Optimal population coding by noisy spiking neurons , 2010, Proceedings of the National Academy of Sciences.

[18]  Eero P. Simoncelli,et al.  Efficient Coding of Spatial Information in the Primate Retina , 2012, The Journal of Neuroscience.

[19]  Michael J. Berry,et al.  Predictive information in a sensory population , 2013, Proceedings of the National Academy of Sciences.