论文信息 - My First Deep Learning System of 1991 + Deep Learning Timeline 1962-2013

My First Deep Learning System of 1991 + Deep Learning Timeline 1962-2013

Deep Learning has attracted significant attention in recent years. Here I present a brief overview of my first Deep Learner of 1991, and its historic context, with a timeline of Deep Learning highlights.

Jürgen Schmidhuber

[1] Paul J. Werbos,et al. Applications of advances in nonlinear sensitivity analysis , 1982 .

[2] J. Urgen Schmidhuber. Neural Sequence Chunkers , 1991 .

[3] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4] Grgoire Montavon,et al. Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[5] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Barak A. Pearlmutter. Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[7] Jürgen Schmidhuber,et al. Transfer learning for Latin and Chinese characters with Deep Neural Networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[8] Jürgen Schmidhuber,et al. A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[9] S. Dreyfus. The numerical solution of variational problems , 1962 .

[10] J. H. Wilkinson. The algebraic eigenvalue problem , 1966 .

[11] Shun-ichi Amari,et al. A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[12] Jürgen Schmidhuber,et al. Continuous history compression , 1993 .

[13] R. Rohrer,et al. Automated Network Design-The Frequency-Domain Case , 1969 .

[14] Lise Getoor,et al. Learning in Logic , 2010, Encyclopedia of Machine Learning.

[15] Jürgen Schmidhuber,et al. Netzwerkarchitekturen, Zielfunktionen und Kettenregel , 1993 .

[16] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[17] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[18] Jürgen Schmidhuber,et al. Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[19] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20] D. George,et al. Hierarchical Temporal Memory Concepts , Theory , and Terminology , 2006 .

[21] Yann LeCun,et al. Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[22] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[23] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[24] Henry J. Kelley,et al. Gradient Theory of Optimal Flight Paths , 1960 .

[25] Jordan B. Pollack,et al. Implications of Recursive Distributed Representations , 1988, NIPS.

[26] R. Schapire. The Strength of Weak Learnability , 1990, Machine Learning.

[27] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[28] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.

[29] Luca Maria Gambardella,et al. Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[30] marquis de L'Hospital. Analyse des infiniment petits, pour l'intelligence des lignes courbes , 1970 .

[31] Luca Maria Gambardella,et al. Fast image scanning with deep max-pooling convolutional neural networks , 2013, 2013 IEEE International Conference on Image Processing.

[32] S. Linnainmaa. Taylor expansion of the accumulated rounding error , 1976 .

[33] Montavon,et al. [Lecture Notes in Computer Science] Neural Networks: Tricks of the Trade Volume 7700 || Deep Learning via Semi-supervised Embedding , 2012 .

[34] Maria S. Kulikova,et al. Mitosis detection in breast cancer histological images An ICPR 2012 contest , 2013, Journal of pathology informatics.

[35] Marc'Aurelio Ranzato,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[36] Geoffrey E. Hinton. Connectionist Learning Procedures , 1989, Artif. Intell..

[37] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[38] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[39] Christopher Kermorvant,et al. The A2iA Arabic Handwritten Text Recognition System at the Open HaRT2013 Evaluation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[40] Dan Ciresan,et al. Multi-Column Deep Neural Networks for offline handwritten Chinese character classification , 2013, 2015 International Joint Conference on Neural Networks (IJCNN).

[41] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[42] Jürgen Schmidhuber,et al. A fast learning algorithm for image segmentation with max-pooling convolutional networks , 2013, 2013 IEEE International Conference on Image Processing.

[43] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[44] Luca Maria Gambardella,et al. Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks , 2013, MICCAI.

[45] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[46] Henry S. Baird,et al. Document image defect models , 1995 .

[47] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[48] Sven Behnke,et al. Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[49] P. Werbos. Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities , 2006 .

[50] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[51] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[52] Luca Maria Gambardella,et al. Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[53] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[54] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[56] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[57] Sven Behnke,et al. Hierarchical Neural Networks for Image Interpretation (Lecture Notes in Computer Science) , 2003 .

[58] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[59] Jürgen Schmidhuber,et al. A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[60] T. Munich,et al. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[61] S. Dreyfus. The computational solution of optimal control problems with time lag , 1973 .

[62] Sven Behnke,et al. Hierarchical Neural Networks for Image Interpretation , 2003, Lecture Notes in Computer Science.

[63] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[64] Jürgen Schmidhuber,et al. A committee of neural networks for traffic sign classification , 2011, The 2011 International Joint Conference on Neural Networks.

[65] Kunihiko Fukushima,et al. Artificial vision by multi-layered neural networks: Neocognitron and its advances , 2013, Neural Networks.

[66] R. Kurzweil. How to Create a Mind: The Secret of Human Thought Revealed , 2012 .

[67] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[68] Patrice Y. Simard,et al. High Performance Convolutional Neural Networks for Document Processing , 2006 .

[69] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[70] George M. Siouris,et al. Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[71] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[72] Yoshua Bengio,et al. Large-Scale Feature Learning With Spike-and-Slab Sparse Coding , 2012, ICML.

[73] Jürgen Schmidhuber,et al. On Fast Deep Nets for AGI Vision , 2011, AGI.

[74] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[75] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[76] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[77] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[78] Dana H. Ballard,et al. Modular Learning in Neural Networks , 1987, AAAI.