Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists

Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior.

[1]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[2]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[3]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[4]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[5]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[6]  Marco Zorzi,et al.  Parallelization of Deep Networks , 2012, ESANN.

[7]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[8]  Tomaso Poggio,et al.  CNS: a GPU-based framework for simulating cortically-organized networks , 2010 .

[9]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[10]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[11]  R. O’Reilly Six principles for biologically based computational models of cortical cognition , 1998, Trends in Cognitive Sciences.

[12]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[13]  Gaurav Sharma,et al.  MATLAB®: A Language for Parallel Computing , 2009, International Journal of Parallel Programming.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  M. Zorzi The connectionist dual process (CDP) approach to modelling reading aloud , 2010 .

[16]  Brian Mingus,et al.  The Emergent neural modeling system , 2008, Neural Networks.

[17]  D. Share Phonological recoding and self-teaching: sine qua non of reading acquisition , 1995, Cognition.

[18]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[19]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[20]  Peter Elias,et al.  Predictive coding-I , 1955, IRE Trans. Inf. Theory.

[21]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[22]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[23]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Marco Zorzi,et al.  Emergence of a 'visual number sense' in hierarchical generative models , 2012, Nature Neuroscience.

[25]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[26]  Luca Maria Gambardella,et al.  Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[27]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[28]  Marc-Oliver Gewaltig,et al.  Efficient Parallel Simulation of Large-Scale Neuronal Networks on Clusters of Multiprocessor Computers , 2007, Euro-Par.

[29]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[30]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[32]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[33]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[34]  Geoffrey E. Hinton,et al.  A Better Way to Pretrain Deep Boltzmann Machines , 2012, NIPS.

[35]  Marco Zorzi,et al.  Space coding for sensorimotor transformations can emerge through unsupervised learning , 2012, Cognitive Processing.

[36]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[37]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[38]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[39]  Rajesh P. N. Rao,et al.  Predictive Coding , 2019, A Blueprint for the Hard Problem of Consciousness.

[40]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[41]  Mancia Anguita,et al.  SCE Toolboxes for the Development of High-Level Parallel Applications , 2006, International Conference on Computational Science.

[42]  Geoffrey E. Hinton Learning to represent visual input , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[43]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[44]  A. Clark Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.

[45]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[46]  Stavros Souravlas,et al.  Design and Implementation of Parallel Counterpropagation Networks Using MPI , 2007, Informatica.

[47]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[48]  Jen-Tzung Chien,et al.  Introduction to the Special Section on Deep Learning for Speech and Language Processing , 2012, IEEE Transactions on Audio, Speech, and Language Processing.