Data-Efficient Mutual Information Neural Estimator

Measuring Mutual Information (MI) between high-dimensional, continuous, random variables from observed samples has wide theoretical and practical applications. Recent work, MINE (Belghazi et al. 2018), focused on estimating tight variational lower bounds of MI using neural networks, but assumed unlimited supply of samples to prevent overfitting. In real world applications, data is not always available at a surplus. In this work, we focus on improving data efficiency and propose a Data-Efficient MINE Estimator (DEMINE), by developing a relaxed predictive MI lower bound that can be estimated at higher data efficiency by orders of magnitudes. The predictive MI lower bound also enables us to develop a new meta-learning approach using task augmentation, Meta-DEMINE, to improve generalization of the network and further boost estimation accuracy empirically. With improved data-efficiency, our estimators enables statistical testing of dependency at practical dataset sizes. We demonstrate the effectiveness of our estimators on synthetic benchmarks and a real world fMRI data, with application of inter-subject correlation analysis.

[1]  Brian B. Avants,et al.  N4ITK: Improved N3 Bias Correction , 2010, IEEE Transactions on Medical Imaging.

[2]  Timothy O. Laumann,et al.  Methods to detect, characterize, and remove motion artifact in resting state fMRI , 2014, NeuroImage.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[5]  D. Poeppel,et al.  Coupled neural systems underlie the production and comprehension of naturalistic narrative speech , 2014, Proceedings of the National Academy of Sciences.

[6]  Thomas T. Liu,et al.  A component based noise correction method (CompCor) for BOLD and perfusion based fMRI , 2007, NeuroImage.

[7]  Michael Brady,et al.  Improved Optimization for the Robust and Accurate Linear Registration and Motion Correction of Brain Images , 2002, NeuroImage.

[8]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[9]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[10]  Igor Vajda,et al.  Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.

[11]  Satrajit S. Ghosh,et al.  The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments , 2016, Scientific Data.

[12]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[13]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[14]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[15]  Bryan R. Conroy,et al.  A Common, High-Dimensional Model of the Representational Space in Human Ventral Temporal Cortex , 2011, Neuron.

[16]  R. Malach,et al.  Intersubject Synchronization of Cortical Activity During Natural Vision , 2004, Science.

[17]  Wenbin Li,et al.  Evaluation of Field Map and Nonlinear Registration Methods for Correction of Susceptibility Artifacts in Diffusion MRI , 2017, Front. Neuroinform..

[18]  Pramod Viswanath,et al.  Demystifying fixed k-nearest neighbor information estimators , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[19]  C. Almli,et al.  Unbiased nonlinear average age-appropriate brain templates from birth to adulthood , 2009, NeuroImage.

[20]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[21]  D. Heeger,et al.  Reliability of cortical activity during natural stimulation , 2010, Trends in Cognitive Sciences.

[22]  Ian J. Goodfellow,et al.  NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.

[23]  Stephen M. Smith,et al.  Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm , 2001, IEEE Transactions on Medical Imaging.

[24]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[25]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[26]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[28]  A. Dale,et al.  Characterization and Correction of Geometric Distortions in 814 Diffusion Weighted Images , 2016, PloS one.

[29]  Brian B. Avants,et al.  Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain , 2008, Medical Image Anal..

[30]  S. Garrod,et al.  Brain-to-brain coupling: a mechanism for creating and sharing a social world , 2012, Trends in Cognitive Sciences.

[31]  Karl Stratos,et al.  Formal Limitations on the Measurement of Mutual Information , 2018, AISTATS.

[32]  J. S. Guntupalli,et al.  A Model of Representational Spaces in Human Cortex , 2016, Cerebral cortex.

[33]  Jesper Andersson,et al.  A multi-modal parcellation of human cerebral cortex , 2016, Nature.

[34]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[35]  Satrajit S. Ghosh,et al.  Nipype: A Flexible, Lightweight and Extensible Neuroimaging Data Processing Framework in Python , 2011, Front. Neuroinform..

[36]  Ibrahim A. Ahmad,et al.  A nonparametric estimation of the entropy for absolutely continuous distributions (Corresp.) , 1976, IEEE Trans. Inf. Theory.

[37]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[38]  R W Cox,et al.  AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. , 1996, Computers and biomedical research, an international journal.

[39]  U. Hasson,et al.  Speaker–listener neural coupling underlies successful communication , 2010, Proceedings of the National Academy of Sciences.

[40]  Sreeram Kannan,et al.  Estimating Mutual Information for Discrete-Continuous Mixtures , 2017, NIPS.

[41]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[42]  Christopher J. Honey,et al.  Loss of reliable temporal structure in event-related averaging of naturalistic stimuli , 2012, NeuroImage.

[43]  Bruce Fischl,et al.  Accurate and robust brain image alignment using boundary-based registration , 2009, NeuroImage.

[44]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[45]  Janice Chen,et al.  Dynamic reconfiguration of the default mode network during narrative comprehension , 2016, Nature Communications.

[46]  Marleen B. Schippers,et al.  Mapping the information flow from one brain to another during gestural communication , 2010, Proceedings of the National Academy of Sciences.

[47]  Satrajit S. Ghosh,et al.  FMRIPrep: a robust preprocessing pipeline for functional MRI , 2018, bioRxiv.

[48]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[49]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[50]  Aäron van den Oord,et al.  On variational lower bounds of mutual information , 2018 .