My First Deep Learning System of 1991 + Deep Learning Timeline 1962-2013

Deep Learning has attracted significant attention in recent years. Here I present a brief overview of my first Deep Learner of 1991, and its historic context, with a timeline of Deep Learning highlights.

[1]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[2]  J. Urgen Schmidhuber Neural Sequence Chunkers , 1991 .

[3]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[5]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[7]  Jürgen Schmidhuber,et al.  Transfer learning for Latin and Chinese characters with Deep Neural Networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[8]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[9]  S. Dreyfus The numerical solution of variational problems , 1962 .

[10]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[11]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[12]  Jürgen Schmidhuber,et al.  Continuous history compression , 1993 .

[13]  R. Rohrer,et al.  Automated Network Design-The Frequency-Domain Case , 1969 .

[14]  Lise Getoor,et al.  Learning in Logic , 2010, Encyclopedia of Machine Learning.

[15]  Jürgen Schmidhuber,et al.  Netzwerkarchitekturen, Zielfunktionen und Kettenregel , 1993 .

[16]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[17]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[18]  Jürgen Schmidhuber,et al.  Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  D. George,et al.  Hierarchical Temporal Memory Concepts , Theory , and Terminology , 2006 .

[21]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[22]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[23]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[24]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .

[25]  Jordan B. Pollack,et al.  Implications of Recursive Distributed Representations , 1988, NIPS.

[26]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[27]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[28]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[29]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[30]  marquis de L'Hospital Analyse des infiniment petits, pour l'intelligence des lignes courbes , 1970 .

[31]  Luca Maria Gambardella,et al.  Fast image scanning with deep max-pooling convolutional neural networks , 2013, 2013 IEEE International Conference on Image Processing.

[32]  S. Linnainmaa Taylor expansion of the accumulated rounding error , 1976 .

[33]  Montavon,et al.  [Lecture Notes in Computer Science] Neural Networks: Tricks of the Trade Volume 7700 || Deep Learning via Semi-supervised Embedding , 2012 .

[34]  Maria S. Kulikova,et al.  Mitosis detection in breast cancer histological images An ICPR 2012 contest , 2013, Journal of pathology informatics.

[35]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[36]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[37]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[38]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[39]  Christopher Kermorvant,et al.  The A2iA Arabic Handwritten Text Recognition System at the Open HaRT2013 Evaluation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[40]  Dan Ciresan,et al.  Multi-Column Deep Neural Networks for offline handwritten Chinese character classification , 2013, 2015 International Joint Conference on Neural Networks (IJCNN).

[41]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[42]  Jürgen Schmidhuber,et al.  A fast learning algorithm for image segmentation with max-pooling convolutional networks , 2013, 2013 IEEE International Conference on Image Processing.

[43]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[44]  Luca Maria Gambardella,et al.  Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks , 2013, MICCAI.

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  Henry S. Baird,et al.  Document image defect models , 1995 .

[47]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[48]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[49]  P. Werbos Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities , 2006 .

[50]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[51]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[52]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[53]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[54]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[56]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[57]  Sven Behnke,et al.  Hierarchical Neural Networks for Image Interpretation (Lecture Notes in Computer Science) , 2003 .

[58]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[59]  Jürgen Schmidhuber,et al.  A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[60]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[61]  S. Dreyfus The computational solution of optimal control problems with time lag , 1973 .

[62]  Sven Behnke,et al.  Hierarchical Neural Networks for Image Interpretation , 2003, Lecture Notes in Computer Science.

[63]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[64]  Jürgen Schmidhuber,et al.  A committee of neural networks for traffic sign classification , 2011, The 2011 International Joint Conference on Neural Networks.

[65]  Kunihiko Fukushima,et al.  Artificial vision by multi-layered neural networks: Neocognitron and its advances , 2013, Neural Networks.

[66]  R. Kurzweil How to Create a Mind: The Secret of Human Thought Revealed , 2012 .

[67]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[68]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[69]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[70]  George M. Siouris,et al.  Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[71]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Yoshua Bengio,et al.  Large-Scale Feature Learning With Spike-and-Slab Sparse Coding , 2012, ICML.

[73]  Jürgen Schmidhuber,et al.  On Fast Deep Nets for AGI Vision , 2011, AGI.

[74]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[75]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[76]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[77]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[78]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.