Learning and Inferring Representations of Data in Neural Networks

Finding useful representations of data in order to facilitate scientific knowledge generation is a ubiquitous concept across disciplines. Until the development of machine learning and statistical methods with hidden or latent representations, useful representations of data were generated “by hand” through scientific modeling or simple measurement observations. Scientific models often make explicit the underlying structure of a system which generates the data we observe and measure. To test a model, inferences must be made about the free parameters and the distributions of latent or unmeasured variables in the model conditioned on the data collected. At this time, many scientific disciplines such as astronomy, particle physics, wildlife conservation, and neuroscience have been moving towards collecting datasets that are large and complex enough so that no human will ever look at and analyze all measurements by hand. Datasets of this scale present an interesting scientific opportunity: to be able to derive insight into the structure of natural systems by creating models which can adapt themselves to the latent structure of large amounts of data, often called data-driven hypothesis testing. The three topics of this work fall under this umbrella, but are largely independent research directions. First, we show how deep learning can be used to infer representations of neural data which can be used to find the limits of information content in sparsely sampled neural activity and applied to improving the performance of brain-computer interfaces. Second, we derive a circuit model for a network neurons which implements approximate inference in a probabilistic model given the biological constraint of neuron-local computations. Finally, we provide a theoretical and empirical analysis of a family of methods for learning linear representations which have low coherence (cosine-similarity) and show that linear methods have limited applicability as compared to nonlinear, recurrent models which solve the same problem. Together, these results provide insight into how scientists and the brain can learn useful representations of data in deep and single layer networks.

[1]  Erkki Oja,et al.  A fast algorithm for estimating overcomplete ICA bases for image windows , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[2]  Jonathan R. Wolpaw,et al.  Brain-computer interfaces (BCIs) for communication and control , 2007, Assets '07.

[3]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1990 .

[4]  J. Tyberghein,et al.  Hearing in children. , 1984, Acta oto-rhino-laryngologica Belgica.

[5]  G. Leuba,et al.  Changes in volume, surface estimate, three-dimensional shape and total number of neurons of the human primary visual cortex from midgestation until old age , 1994, Anatomy and Embryology.

[6]  Jonathan W. Pillow,et al.  Single-trial spike trains in parietal cortex reveal discrete steps during decision-making , 2015, Science.

[7]  Michael Robert DeWeese,et al.  A Sparse Coding Model with Synaptically Local Plasticity and Spiking Neurons Can Account for the Diverse Shapes of V1 Simple Cell Receptive Fields , 2011, PLoS Comput. Biol..

[8]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[9]  Razvan Pascanu,et al.  M L ] 2 0 A ug 2 01 3 Pylearn 2 : a machine learning research library , 2014 .

[10]  Robert D Flint,et al.  Direct classification of all American English phonemes using signals from functional speech motor cortex , 2014, Journal of neural engineering.

[11]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[12]  W. Penfield,et al.  SOMATIC MOTOR AND SENSORY REPRESENTATION IN THE CEREBRAL CORTEX OF MAN AS STUDIED BY ELECTRICAL STIMULATION , 1937 .

[13]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Kristofer E. Bouchard,et al.  Neural decoding of spoken vowels from human sensory-motor cortex with high-density electrocorticography , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[18]  William Bialek,et al.  Spikes: Exploring the Neural Code , 1996 .

[19]  William Bialek,et al.  Entropy and Information in Neural Spike Trains , 1996, cond-mat/9603127.

[20]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[21]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[22]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[23]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[24]  Chris Eliasmith,et al.  Neural Engineering: Computation, Representation, and Dynamics in Neurobiological Systems , 2004, IEEE Transactions on Neural Networks.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[27]  Timothy D. Hanks,et al.  Probabilistic Population Codes for Bayesian Decision Making , 2008, Neuron.

[28]  Aapo Hyvärinen,et al.  Estimating Overcomplete Independent Component Bases for Image Windows , 2002, Journal of Mathematical Imaging and Vision.

[29]  Mathukumalli Vidyasagar,et al.  An Introduction to Compressed Sensing , 2019 .

[30]  Sebastian Stober,et al.  Using Convolutional Neural Networks to Recognize Rhythm Stimuli from Electroencephalography Recordings , 2014, NIPS.

[31]  Nima Mesgarani,et al.  Speech reconstruction from human auditory cortex with deep neural networks , 2015, INTERSPEECH.

[32]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[33]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[34]  William Bialek,et al.  Analyzing Neural Responses to Natural Signals: Maximally Informative Dimensions , 2002, Neural Computation.

[35]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[36]  T. Poggio,et al.  BOOK REVIEW David Marr’s Vision: floreat computational neuroscience VISION: A COMPUTATIONAL INVESTIGATION INTO THE HUMAN REPRESENTATION AND PROCESSING OF VISUAL INFORMATION , 2009 .

[37]  Terrence J. Sejnowski,et al.  Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis , 2007, NeuroImage.

[38]  Justin A. Blanco,et al.  Modeling electroencephalography waveforms with semi-supervised deep belief nets: fast classification and anomaly measurement , 2011, Journal of neural engineering.

[39]  M S Lewicki,et al.  A review of methods for spike sorting: the detection and classification of neural action potentials. , 1998, Network.

[40]  Jascha Sohl-Dickstein,et al.  Hamiltonian Monte Carlo Without Detailed Balance , 2014, ICML.

[41]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .

[42]  S. Smale Mathematical problems for the next century , 1998 .

[43]  Yann LeCun,et al.  Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .

[44]  Aapo Hyvärinen,et al.  Unifying Blind Separation and Clustering for Resting-State EEG/MEG Functional Connectivity Analysis , 2015, Neural Computation.

[45]  Eero P. Simoncelli,et al.  Spatio-temporal correlations and visual signalling in a complete neuronal population , 2008, Nature.

[46]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[47]  Martin Rehn,et al.  A network that uses few active neurones to code visual input predicts the diverse shapes of cortical receptive fields , 2007, Journal of Computational Neuroscience.

[48]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[49]  John J. Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities , 1999 .

[50]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[51]  H. Spoendlin,et al.  Analysis of the human auditory nerve , 1989, Hearing Research.

[52]  Michael J. Berry,et al.  Predictive information in a sensory population , 2013, Proceedings of the National Academy of Sciences.

[53]  Thomas Brox,et al.  Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[54]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[55]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[56]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[57]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[58]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[59]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[60]  A. Cruttenden Gimson's Pronunciation of English , 1994 .

[61]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[62]  D. Chakrabarti,et al.  A fast fixed - point algorithm for independent component analysis , 1997 .

[63]  S. Thorpe,et al.  Spike times make sense , 2005, Trends in Neurosciences.

[64]  Pavan Ramkumar,et al.  Modern machine learning outperforms GLMs at predicting spikes , 2017, bioRxiv.

[65]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[66]  Surya Ganguli,et al.  Deep Learning Models of the Retinal Response to Natural Scenes , 2017, NIPS.

[67]  Guillermo Sapiro,et al.  Sparse Modeling with Universal Priors and Learned Incoherent Dictionaries(PREPRINT) , 2009 .

[68]  A. Yuille,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[69]  Pentti Kanerva,et al.  Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors , 2009, Cognitive Computation.

[70]  L. Chua Memristor-The missing circuit element , 1971 .

[71]  J. H. Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998 .

[72]  G. Schalk,et al.  Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans , 2011, Journal of neural engineering.

[73]  Christopher J. Rozell,et al.  Optimal Sparse Approximation with Integrate and Fire Neurons , 2014, Int. J. Neural Syst..

[74]  Gert Cauwenberghs,et al.  Neuromorphic Silicon Neuron Circuits , 2011, Front. Neurosci.

[75]  Bruno A. Olshausen,et al.  Highly overcomplete sparse coding , 2013, Electronic Imaging.

[76]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[77]  M. A. Repucci,et al.  Spatial Structure and Symmetry of Simple-Cell Receptive Fields in Macaque Primary Visual Cortex , 2002 .

[78]  Bruno A. Olshausen,et al.  PROBABILISTIC FRAMEWORK FOR THE ADAPTATION AND COMPARISON OF IMAGE CODES , 1999 .

[79]  Cuntai Guan,et al.  Electrocorticographic representations of segmental features in continuous speech , 2015, Front. Hum. Neurosci..

[80]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[81]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[82]  Ian H. Stevenson,et al.  Spatially Distributed Local Fields in the Hippocampus Encode Rat Position , 2014, Science.

[83]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[84]  Nicole L. Carlson,et al.  Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus , 2012, PLoS Comput. Biol..

[85]  Yike Guo,et al.  Feature extraction with stacked autoencoders for epileptic seizure detection , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[86]  Aapo Hyvärinen,et al.  A Hierarchical Statistical Model of Natural Images Explains Tuning Properties in V2 , 2015, The Journal of Neuroscience.

[87]  Kristofer E. Bouchard,et al.  Functional Organization of Human Sensorimotor Cortex for Speech Articulation , 2013, Nature.

[88]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[89]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[90]  Tanja Schultz,et al.  Brain-to-text: decoding spoken phrases from phone representations in the brain , 2015, Front. Neurosci..

[91]  Konrad P. Körding,et al.  Sparse Spectrotemporal Coding of Sounds , 2003, EURASIP J. Adv. Signal Process..

[92]  Rajesh P. N. Rao,et al.  Probabilistic Models of the Brain: Perception and Neural Function , 2002 .

[93]  Nathaniel I. Durlach,et al.  Note on Information Transfer Rates in Human Communication , 1998, Presence.

[94]  C. Curcio,et al.  Topography of ganglion cells in human retina , 1990, The Journal of comparative neurology.

[95]  Bradley Greger,et al.  Decoding spoken words using local field potentials recorded from the cortical surface , 2010, Journal of neural engineering.

[96]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[97]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[98]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[99]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[100]  T. Hromádka,et al.  Reliability and Representational Bandwidth in the Auditory Cortex , 2005, Neuron.

[101]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[102]  Thomas Strohmer,et al.  GRASSMANNIAN FRAMES WITH APPLICATIONS TO CODING AND COMMUNICATION , 2003, math/0301135.

[103]  A. Robert Calderbank,et al.  A fast reconstruction algorithm for deterministic compressive sensing using second order reed-muller codes , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[104]  Jean,et al.  The Computer and the Brain , 1989, Annals of the History of Computing.

[105]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[106]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[107]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[108]  T Bonhoeffer,et al.  Orientation selectivity in pinwheel centers in cat striate cortex. , 1997, Science.

[109]  Edward F Chang,et al.  Control of Spoken Vowel Acoustics and the Influence of Phonetic Context in Human Speech Sensorimotor Cortex , 2014, The Journal of Neuroscience.

[110]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[111]  Quoc V. Le,et al.  ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[112]  Oriol Vinyals,et al.  Qualitatively characterizing neural network optimization problems , 2014, ICLR.

[113]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[114]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[115]  Tanja Schultz,et al.  Pattern learning with deep neural networks in EMG-based speech recognition , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[116]  A. Hyvärinen,et al.  Complex cell pooling and the statistics of natural images , 2007, Network.

[117]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[118]  Ryan P. Adams,et al.  Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders , 2016 .

[119]  H. Barlow Critical limiting factors in the design of the eye and visual cortex , 1981 .

[120]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[121]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[122]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[123]  Richard G. Baraniuk,et al.  Sparse Coding via Thresholding and Local Competition in Neural Circuits , 2008, Neural Computation.

[124]  Joachim M. Buhmann,et al.  Learning Dictionaries With Bounded Self-Coherence , 2012, IEEE Signal Processing Letters.

[125]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.