Draft: Deep Learning in Neural Networks: An Overview

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

[1]  Amir F. Atiya,et al.  New results on recurrent network training: unifying the algorithms and accelerating convergence , 2000, IEEE Trans. Neural Networks Learn. Syst..

[2]  Kiyotoshi Matsuoka,et al.  Noise injection into inputs in back-propagation learning , 1992, IEEE Trans. Syst. Man Cybern..

[3]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[4]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[5]  W. Vent,et al.  Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .

[6]  Douglas B. Lenat,et al.  Why AM and EURISKO Appear to Work , 1984, Artif. Intell..

[7]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[9]  Peter M. Todd,et al.  Designing Neural Networks using Genetic Algorithms , 1989, ICGA.

[10]  Lawrence Davis,et al.  Training Feedforward Neural Networks Using Genetic Algorithms , 1989, IJCAI.

[11]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[12]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Anne Condon,et al.  On the undecidability of probabilistic planning and related stochastic optimization problems , 2003, Artif. Intell..

[14]  Juha Karhunen,et al.  Generalizations of principal component analysis, optimization problems, and neural networks , 1995, Neural Networks.

[15]  Geoffrey E. Hinton,et al.  Learning Population Codes by Minimizing Description Length , 1993, Neural Computation.

[16]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[17]  Peter Tiňo,et al.  Learning long-term dependencies is not as difficult with NARX recurrent neural networks , 1995 .

[18]  P. Földiák,et al.  Forming sparse representations by local anti-Hebbian learning , 1990, Biological Cybernetics.

[19]  J. Baxter,et al.  Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[20]  Henry S. Baird,et al.  Document image defect models , 1995 .

[21]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[22]  Vivek S. Borkar,et al.  Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..

[23]  Mark B. Ring Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.

[24]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[25]  Ralph Neuneier,et al.  How to Train Neural Networks , 2012, Neural Networks: Tricks of the Trade.

[26]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[27]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[28]  Steffen Udluft,et al.  Learning Long Term Dependencies with Recurrent Neural Networks , 2006, ICANN.

[29]  Richard S. Sutton,et al.  Neural networks for control , 1990 .

[30]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[31]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[32]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[33]  F. Faggin,et al.  Neural network hardware , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[34]  Sukhan Lee,et al.  A Gaussian potential function network with hierarchically self-organizing learning , 1991, Neural Networks.

[35]  Tapani Raiko,et al.  Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.

[36]  Jürgen Schmidhuber,et al.  Recurrent policy gradients , 2010, Log. J. IGPL.

[37]  Nikola K. Kasabov,et al.  NeuCube: A spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data , 2014, Neural Networks.

[38]  Janet Wiles,et al.  Recurrent Neural Networks Can Learn to Implement Symbol-Sensitive Counting , 1997, NIPS.

[39]  David B. Fogel,et al.  Evolving Neural Control Systems , 1995, IEEE Expert.

[40]  Tom M. Mitchell,et al.  Explanation-Based Generalization: A Unifying View , 1986, Machine Learning.

[41]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[42]  Mark B. Ring Incremental Development of Complex Behaviors , 1991, ML.

[43]  Benjamin Schrauwen,et al.  An overview of reservoir computing: theory, applications and implementations , 2007, ESANN.

[44]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[45]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[46]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[47]  Marcus Hutter The Fastest and Shortest Algorithm for all Well-Defined Problems , 2002, Int. J. Found. Comput. Sci..

[48]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[49]  Kaspar Anton Schindler,et al.  When pyramidal neurons lock, when they respond chaotically, and when they like to synchronize , 2000, Neuroscience Research.

[50]  Raymond L. Watrous,et al.  Induction of Finite-State Automata Using Second-Order Recurrent Networks , 1991, NIPS.

[51]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .

[52]  E. Blum,et al.  The Mathematical Theory of Optimal Processes. , 1963 .

[53]  Stephan K. Chalup,et al.  Incremental training of first order recurrent neural networks to predict a context-sensitive language , 2003, Neural Networks.

[54]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[55]  Isabelle Guyon,et al.  Structural Risk Minimization for Character Recognition , 1991, NIPS.

[56]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[57]  Jordan B. Pollack,et al.  Implications of Recursive Distributed Representations , 1988, NIPS.

[58]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[59]  Bernard Widrow,et al.  Associative Storage and Retrieval of Digital Information in Networks of Adaptive “Neurons” , 1962 .

[60]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[61]  Jürgen Schmidhuber,et al.  Continuous history compression , 1993 .

[62]  Ali A. Minai,et al.  Perturbation response in feedforward networks , 1994, Neural Networks.

[63]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[64]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[65]  John N. Tsitsiklis,et al.  A survey of computational complexity results in systems and control , 2000, Autom..

[66]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[67]  Michael S. Falconbridge,et al.  A Simple Hebbian/Anti-Hebbian Network Learns the Sparse, Independent Components of Natural Images , 2006, Neural Computation.

[68]  Jürgen Schmidhuber,et al.  Discovering Predictable Classifications , 1993, Neural Computation.

[69]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[70]  Pierre Baldi,et al.  Understanding Dropout , 2013, NIPS.

[71]  Henry Markram,et al.  The human brain project. , 2012, Scientific American.

[72]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .

[73]  E. Rolls,et al.  Neurodynamics of biased competition and cooperation for attention: a model with spiking neurons. , 2005, Journal of neurophysiology.

[74]  Steven Douglas Whitehead,et al.  Reinforcement learning for the adaptive control of perception and action , 1992 .

[75]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[76]  Ming Yang,et al.  Detecting Human Actions in Surveillance Videos , 2009, TRECVID.

[77]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[78]  Robert Desimone,et al.  Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4 , 2005, Science.

[79]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[80]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[81]  Dr. Marcus Hutter,et al.  Universal artificial intelligence , 2004 .

[82]  Carolo Friederico Gauss Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium , 2014 .

[83]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[84]  Jürgen Schmidhuber,et al.  The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions , 2002, COLT.

[85]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[86]  M. Graziano The Intelligent Movement Machine: An Ethological Perspective on the Primate Motor System , 2008 .

[87]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[88]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[89]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[90]  Jürgen Schmidhuber,et al.  Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.

[91]  Risto Miikkulainen,et al.  Accelerated Neural Evolution through Cooperatively Coevolved Synapses , 2008, J. Mach. Learn. Res..

[92]  R. Rohrer,et al.  Automated Network Design-The Frequency-Domain Case , 1969 .

[93]  Punit Shah Toward a Neurobiology of Unrealistic Optimism , 2012, Front. Psychology.

[94]  Shun-ichi Amari,et al.  Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.

[95]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[96]  Peter Tiño,et al.  Architectural Bias in Recurrent Neural Networks: Fractal Analysis , 2002, Neural Computation.

[97]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[98]  Christopher Kermorvant,et al.  The A2iA Arabic Handwritten Text Recognition System at the Open HaRT2013 Evaluation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[99]  Jürgen Schmidhuber,et al.  Learning to generate sub-goals for action sequences , 1991 .

[100]  Dan Ciresan,et al.  Multi-Column Deep Neural Networks for offline handwritten Chinese character classification , 2013, 2015 International Joint Conference on Neural Networks (IJCNN).

[101]  B. Speelpenning Compiling Fast Partial Derivatives of Functions Given by Algorithms , 1980 .

[102]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[103]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[104]  Luca Maria Gambardella,et al.  Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks , 2013, MICCAI.

[105]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[106]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[107]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[108]  G. V. Puskorius,et al.  A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification , 1998, Proc. IEEE.

[109]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[110]  Jürgen Schmidhuber,et al.  Planning simple trajectories using neural subgoal generators , 1993 .

[111]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[112]  Shimon Whiteson,et al.  Critical factors in the performance of hyperNEAT , 2013, GECCO '13.

[113]  Keiji Tanaka,et al.  Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey , 2008, Neuron.

[114]  J. Stephen Judd,et al.  Optimal stopping and effective machine complexity in learning , 1993, Proceedings of 1995 IEEE International Symposium on Information Theory.

[115]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[116]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[117]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[118]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[119]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[120]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[121]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[122]  David J. Jilk,et al.  Recurrent Processing during Object Recognition , 2011, Front. Psychol..

[123]  Subhash C. Kak,et al.  Data Mining Using Surface and Deep Agents Based on Neural Networks , 2010, AMCIS.

[124]  Randall D. Beer,et al.  Sequential Behavior and Learning in Evolved Dynamical Neural Networks , 1994, Adapt. Behav..

[125]  Jürgen Schmidhuber,et al.  Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[126]  Pasi Koikkalainen,et al.  Self-organizing hierarchical feature maps , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[127]  Chalapathy Neti,et al.  Maximally fault tolerant neural networks , 1992, IEEE Trans. Neural Networks.

[128]  Lucas C. Parra,et al.  Non-linear Feature Extraction by Redundancy Reduction in an Unsupervised Stochastic Neural Network , 1997, Neural Networks.

[129]  Wulfram Gerstner,et al.  Stochastic variational learning in recurrent spiking networks , 2014, Front. Comput. Neurosci..

[130]  Christof Koch,et al.  Unsupervised Learning of Individuals and Categories from Images , 2008, Neural Computation.

[131]  Jürgen Schmidhuber,et al.  Optimal Ordered Problem Solver , 2002, Machine Learning.

[132]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[133]  Emil L. Post Finite combinatory processes—formulation , 1936, Journal of Symbolic Logic.

[134]  Jürgen Schmidhuber,et al.  Fast Online Q(λ) , 1998, Machine Learning.

[135]  Dennis Gabor,et al.  Theory of communication , 1946 .

[136]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[137]  Nicholas T. Carnevale,et al.  Simulation of networks of spiking neurons: A review of tools and strategies , 2006, Journal of Computational Neuroscience.

[138]  Petre Stoica,et al.  Decentralized Control , 2018, The Control Systems Handbook.

[139]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[140]  Jonas Karlsson,et al.  Learning via task decomposition , 1993 .

[141]  R. Desimone,et al.  Stimulus-selective properties of inferior temporal neurons in the macaque , 1984, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[142]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[143]  Lee A. Feldkamp,et al.  Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks , 1994, IEEE Trans. Neural Networks.

[144]  Shimon Whiteson,et al.  Evolutionary Computation for Reinforcement Learning , 2012, Reinforcement Learning.

[145]  G. Orban,et al.  Model circuit of spiking neurons generating directional selectivity in simple cells. , 1996, Journal of neurophysiology.

[146]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[147]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[148]  Brendan J. Frey,et al.  Adaptive dropout for training deep neural networks , 2013, NIPS.

[149]  Andrew W. Moore,et al.  The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces , 1993, Machine Learning.

[150]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[151]  Jürgen Schmidhuber,et al.  Feature Extraction Through LOCOCODE , 1999, Neural Computation.

[152]  Lin Wu,et al.  Learning to play Go using recursive neural networks , 2008, Neural Networks.

[153]  S. Yoshizawa,et al.  An Active Pulse Transmission Line Simulating Nerve Axon , 1962, Proceedings of the IRE.

[154]  Janet Wiles,et al.  Context-free and context-sensitive dynamics in recurrent neural networks , 2000, Connect. Sci..

[155]  Maria S. Kulikova,et al.  Mitosis detection in breast cancer histological images An ICPR 2012 contest , 2013, Journal of pathology informatics.

[156]  D. G. Albrecht,et al.  Spatial frequency selectivity of cells in macaque visual cortex , 1982, Vision Research.

[157]  R. Sutton,et al.  A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[158]  Gert Cauwenberghs,et al.  Event-driven contrastive divergence for spiking neuromorphic systems , 2013, Front. Neurosci..

[159]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[160]  Luis A. Plana,et al.  SpiNNaker: Mapping neural networks onto a massively-parallel chip multiprocessor , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[161]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[162]  Zhaoping Li,et al.  Understanding Retinal Color Coding from First Principles , 1992, Neural Computation.

[163]  S. Grossberg Some Networks That Can Learn, Remember, and Reproduce any Number of Complicated Space-Time Patterns, I , 1969 .

[164]  Terrence J. Sejnowski,et al.  Unsupervised Discrimination of Clustered Data via Optimization of Binary Information Gain , 1992, NIPS.

[165]  Wulfram Gerstner,et al.  Spiking Neuron Models , 2002 .

[166]  Chia-Feng Juang,et al.  A hybrid of genetic algorithm and particle swarm optimization for recurrent network design , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[167]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[168]  Jürgen Schmidhuber,et al.  Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts , 2005 .

[169]  Tobi Delbrück,et al.  Orientation-Selective aVLSI Spiking Neurons , 2001, NIPS.

[170]  Jürgen Schmidhuber,et al.  Classifying Unprompted Speech by Retraining LSTM Nets , 2005, ICANN.

[171]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[172]  Shixin Cheng,et al.  Dynamic learning rate optimization of the backpropagation algorithm , 1995, IEEE Trans. Neural Networks.

[173]  Thomas M. Breuel,et al.  High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[174]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[175]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[176]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[177]  Thomas Serre,et al.  On the Role of Object-Specific Features for Real World Object Recognition in Biological Vision , 2002, Biologically Motivated Computer Vision.

[178]  Padhraic Smyth,et al.  Discrete recurrent neural networks for grammatical inference , 1994, IEEE Trans. Neural Networks.

[179]  Jürgen Schmidhuber,et al.  Prototype Resilient, Self-Modeling Robots , 2007, Science.

[180]  Lillian Lee,et al.  Learning of Context-Free Languages: A Survey of the Literature , 1996 .

[181]  Johannes Stallkamp,et al.  The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[182]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[183]  Fu-Chuang Chen,et al.  Adaptive control of nonlinear systems using neural networks , 1992 .

[184]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[185]  Kosko Unsupervised learning in noise , 1989 .

[186]  Narendra Ahuja,et al.  Cresceptron: a self-organizing neural network which grows adaptively , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[187]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[188]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[189]  David E. Moriarty,et al.  Symbiotic Evolution of Neural Networks in Sequential Decision Tasks , 1997 .

[190]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[191]  Jürgen Schmidhuber,et al.  Netzwerkarchitekturen, Zielfunktionen und Kettenregel , 1993 .

[192]  Xin Yao,et al.  A review of evolutionary artificial neural networks , 1993, Int. J. Intell. Syst..

[193]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[194]  D. Goldfarb A family of variable-metric methods derived by variational means , 1970 .

[195]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[196]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[197]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[198]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[199]  Yoshua Bengio,et al.  Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery , 2012, ArXiv.

[200]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[201]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[202]  Jürgen Schmidhuber,et al.  Training Recurrent Networks by Evolino , 2007, Neural Computation.

[203]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[204]  Peter Lennie,et al.  Coding of color and form in the geniculostriate visual pathway (invited review). , 2005, Journal of the Optical Society of America. A, Optics, image science, and vision.

[206]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[207]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[208]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[209]  D. B. Fogel,et al.  Evolving neural networks , 1990, Biological Cybernetics.

[210]  Narendra Ahuja,et al.  Learning Recognition and Segmentation Using the Cresceptron , 1997, International Journal of Computer Vision.

[211]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[212]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[213]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[214]  Eric Moulines,et al.  A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..

[215]  Rafal Salustowicz,et al.  Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[216]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[217]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[218]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[219]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[220]  Tom Schaul,et al.  Multi-Dimensional Deep Memory Atari-Go Players for Parameter Exploring Policy Gradients , 2010, ICANN.

[221]  A. G. Ivakhnenko,et al.  Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[222]  Reinhold Behringer,et al.  The seeing passenger car 'VaMoRs-P' , 1994, Proceedings of the Intelligent Vehicles '94 Symposium.

[223]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[224]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[225]  G. Miller Learning to Forget , 2004, Science.

[226]  Jeffrey L. Elman,et al.  Learning and Evolution in Neural Networks , 1994, Adapt. Behav..

[227]  Wulfram Gerstner,et al.  Reduction of the Hodgkin-Huxley Equations to a Single-Variable Threshold Model , 1997, Neural Computation.

[228]  Yves Deville,et al.  Logic Program Synthesis , 1994, J. Log. Program..

[229]  Julian F. Miller,et al.  Genetic and Evolutionary Computation — GECCO 2003 , 2003, Lecture Notes in Computer Science.

[230]  Danil V. Prokhorov,et al.  Enhanced Multi-Stream Kalman Filter Training for Recurrent Networks , 1998 .

[231]  C. Lee Giles,et al.  Effects of Noise on Convergence and Generalization in Recurrent Networks , 1994, NIPS.

[232]  G. Palm,et al.  On associative memory , 2004, Biological Cybernetics.

[233]  Laurenz Wiskott,et al.  Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells , 2007, PLoS Comput. Biol..

[234]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[235]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[236]  Douglas Aberdeen,et al.  Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[237]  P. Werbos Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities , 2006 .

[238]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[239]  Terrence J. Sejnowski,et al.  Tempering Backpropagation Networks: Not All Weights are Created Equal , 1995, NIPS.

[240]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[241]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[242]  Tom Schaul,et al.  Exponential natural evolution strategies , 2010, GECCO '10.

[243]  R. Kurzweil How to Create a Mind: The Secret of Human Thought Revealed , 2012 .

[244]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[245]  Ronald J. Williams,et al.  Training recurrent networks using the extended Kalman filter , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[246]  Jude W. Shavlik,et al.  Combining Symbolic and Neural Learning , 1994, Machine Learning.

[247]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[248]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[249]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[250]  Jürgen Schmidhuber,et al.  Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets , 2003, Neural Networks.

[251]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[252]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[253]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[254]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[255]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[256]  Les E. Atlas,et al.  Recurrent neural networks and robust time series prediction , 1994, IEEE Trans. Neural Networks.

[257]  Garrison W. Cottrell,et al.  Non-Linear Dimensionality Reduction , 1992, NIPS.

[258]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[259]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[260]  M. F. Møller,et al.  Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in 0(N) Time , 1993 .

[261]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[262]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[263]  S. Haykin Kalman Filtering and Neural Networks , 2001 .

[264]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[265]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[266]  Paul Rodríguez,et al.  A Recurrent Neural Network that Learns to Count , 1999, Connect. Sci..

[267]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[268]  L. Abbott,et al.  Competitive Hebbian learning through spike-timing-dependent synaptic plasticity , 2000, Nature Neuroscience.

[269]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[270]  S. Dreyfus The computational solution of optimal control problems with time lag , 1973 .

[271]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[272]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[273]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[274]  Giovanni Soda,et al.  Bidirectional Dynamics for Protein Secondary Structure Prediction , 2001, Sequence Learning.

[275]  Józef Korbicz,et al.  A GMDH neural network-based approach to robust fault diagnosis : Application to the DAMADICS benchmark problem , 2006 .

[276]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[277]  Douglas B. Lenat,et al.  Theory Formation by Heuristic Search , 1983, Artificial Intelligence.

[278]  M. Mozer Discovering Discrete Distributed Representations with Iterative Competitive Learning , 1990, NIPS 1990.

[279]  R. Bellman Dynamic programming. , 1957, Science.

[280]  Risto Miikkulainen,et al.  Efficient Reinforcement Learning through Symbiotic Evolution , 2004 .

[281]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[282]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[283]  D. Perrett,et al.  Visual neurones responsive to faces in the monkey temporal cortex , 2004, Experimental Brain Research.

[284]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[285]  Kenneth O. Stanley,et al.  On the Performance of Indirect Encoding Across the Continuum of Regularity , 2011, IEEE Transactions on Evolutionary Computation.

[286]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[287]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[288]  Jürgen Schmidhuber,et al.  Sequential Constant Size Compressors for Reinforcement Learning , 2011, AGI.

[289]  Stefano Nolfi,et al.  How to Evolve Autonomous Robots: Different Approaches in Evolutionary Robotics , 1994 .

[290]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[291]  Jürgen Schmidhuber,et al.  A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks , 2006 .

[292]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[293]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[294]  Paul E. Utgoff,et al.  Many-Layered Learning , 2002, Neural Computation.

[295]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[296]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[297]  Jirí Síma,et al.  Training a Single Sigmoidal Neuron Is Hard , 2002, Neural Comput..

[298]  David Barber,et al.  On the Computational Complexity of Stochastic Controller Optimization in POMDPs , 2011, TOCT.

[299]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.

[300]  Steve B. Furber,et al.  Modeling Spiking Neural Networks on SpiNNaker , 2010, Computing in Science & Engineering.

[301]  R. Kempter,et al.  Hebbian learning and spiking neurons , 1999 .

[302]  M. C. Jones,et al.  Spline Smoothing and Nonparametric Regression. , 1989 .

[303]  Tom Schaul,et al.  A linear time natural evolution strategy for non-separable functions , 2011, GECCO.

[304]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[305]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[306]  Barnabás Póczos,et al.  Cross-Entropy Optimization for Independent Process Analysis , 2006, ICA.

[307]  S. Dreyfus The numerical solution of variational problems , 1962 .

[308]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[309]  Yoshinori Sagisaka,et al.  Phoneme boundary estimation using bidirectional recurrent neural networks and its applications , 1999 .

[310]  Wolfgang Maass,et al.  Emergence of complex computational structures from chaotic neural networks through reward-modulated Hebbian learning. , 2014, Cerebral cortex.

[311]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[312]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[313]  Radford M. Neal,et al.  High Dimensional Classification with Bayesian Neural Networks and Dirichlet Diffusion Trees , 2006, Feature Extraction.

[314]  H. Akaike Statistical predictor identification , 1970 .

[315]  Stanley J. Farlow,et al.  Self-Organizing Methods in Modeling: Gmdh Type Algorithms , 1984 .

[316]  Andreas Rauber,et al.  The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data , 2002, IEEE Trans. Neural Networks.

[317]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[318]  A. Turing On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[319]  Jürgen Schmidhuber,et al.  Learning Algorithms for Networks with Internal and External Feedback , 1990 .

[320]  Terrence J. Sejnowski,et al.  Graphical Models: Foundations of Neural Computation , 2001, Pattern Anal. Appl..

[321]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[322]  Gert Cauwenberghs,et al.  Neuromorphic Silicon Neuron Circuits , 2011, Front. Neurosci.

[323]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[324]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[325]  Jürgen Schmidhuber,et al.  Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[326]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[327]  Jürgen Schmidhuber,et al.  An intrinsic value system for developing multiple invariant representations with incremental slowness learning , 2013, Front. Neurorobot..

[328]  Shigenobu Kobayashi,et al.  Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.

[329]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[330]  W. Senn,et al.  Matching Recall and Storage in Sequence Learning with Spiking Neural Networks , 2013, The Journal of Neuroscience.

[331]  Marco Zorzi,et al.  Emergence of a 'visual number sense' in hierarchical generative models , 2012, Nature Neuroscience.

[332]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[333]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[334]  Geoffrey E. Hinton,et al.  Varieties of Helmholtz Machine , 1996, Neural Networks.

[335]  Tadashi Kondo,et al.  Multi-layered GMDH-type neural network self-selecting optimum neural network architecture and its application to 3-dimensional medical image recognition of blood vessels , 2008 .

[336]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[337]  Davide Anguita,et al.  An efficient implementation of BP on RISC-based workstations , 1994, Neurocomputing.

[338]  A. S. Weigend,et al.  Results of the time series prediction competition at the Santa Fe Institute , 1993, IEEE International Conference on Neural Networks.

[339]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[340]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[341]  Bart Kosko,et al.  Unsupervised learning in noise , 1990, International 1989 Joint Conference on Neural Networks.

[342]  Jürgen Schmidhuber,et al.  A committee of neural networks for traffic sign classification , 2011, The 2011 International Joint Conference on Neural Networks.

[343]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[344]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[345]  Robert Balzer,et al.  A 15 Year Perspective on Automatic Programming , 1985, IEEE Transactions on Software Engineering.

[346]  Saburo Ikeda,et al.  Sequential GMDH Algorithm and Its Application to River Flow Prediction , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[347]  D I Perrett,et al.  Organization and functions of cells responsive to faces in the temporal cortex. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[348]  H. Sebastian Seung,et al.  Natural Image Denoising with Convolutional Networks , 2008, NIPS.

[349]  Jürgen Schmidhuber,et al.  Accelerated learning in back-propagation nets , 1989 .

[350]  Jürgen Schmidhuber,et al.  Incremental Slow Feature Analysis: Adaptive Low-Complexity Slow Feature Updating from High-Dimensional Input Streams , 2012, Neural Computation.

[351]  Naonori Ueda,et al.  Optimal Linear Combination of Neural Networks for Improving Classification Performance , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[352]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[353]  Nicolas Brunel,et al.  Dynamics of a recurrent network of spiking neurons before and following learning , 1997 .

[354]  Johannes Schemmel,et al.  Implementing Synaptic Plasticity in a VLSI Spiking Neural Network Model , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[355]  Jürgen Schmidhuber,et al.  A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[356]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[357]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[358]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[359]  Fred Henrik Hamker,et al.  Learning Invariance from Natural Images Inspired by Observations in the Primary Visual Cortex , 2012, Neural Computation.

[360]  R. FitzHugh Impulses and Physiological States in Theoretical Models of Nerve Membrane. , 1961, Biophysical journal.

[361]  Stefano Nolfi,et al.  Evolving mobile robots in simulated and real environments , 1995 .

[362]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[363]  Kazumi Saito,et al.  Partial BFGS Update and Efficient Step-Length Calculation for Three-Layer Neural Networks , 1997, Neural Computation.

[364]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[365]  Luca Maria Gambardella,et al.  Fast image scanning with deep max-pooling convolutional neural networks , 2013, 2013 IEEE International Conference on Image Processing.

[366]  Robert A. Legenstein,et al.  Reinforcement Learning on Slow Features of High-Dimensional Input Streams , 2010, PLoS Comput. Biol..

[367]  Risto Miikkulainen,et al.  Active Guidance for a Finless Rocket Using Neuroevolution , 2003, GECCO.

[368]  Andrzej Cichocki,et al.  Neural networks for optimization and signal processing , 1993 .

[369]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[370]  Wolfgang Maass,et al.  Emergence of Dynamic Memory Traces in Cortical Microcircuit Models through STDP , 2013, The Journal of Neuroscience.

[371]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[372]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[373]  M. Stemmler A single spike suffices: the simplest form of stochastic resonance in model neurons , 1996 .

[374]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[375]  David H. Wolpert,et al.  Bayesian Backpropagation Over I-O Functions Rather Than Weights , 1993, NIPS.

[376]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[377]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[378]  Jan Peters Policy gradient methods , 2010, Scholarpedia.

[379]  F. Vallet,et al.  Robustness in Multilayer Perceptrons , 1993, Neural Computation.

[380]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[381]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[382]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[383]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[384]  Sridhar Mahadevan,et al.  Hierarchical Policy Gradient Algorithms , 2003, ICML.

[385]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[386]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[387]  Volkmar Frinken,et al.  Long-short term memory neural networks language modeling for handwriting recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[388]  Alan S. Lapedes,et al.  A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition , 1986 .

[389]  José Carlos Príncipe,et al.  A Theory for Neural Networks with Time Delays , 1990, NIPS.

[390]  Eric Saund,et al.  Unsupervised Learning of Mixtures of Multiple Causes in Binary Data , 1993, NIPS.

[391]  Jürgen Schmidhuber,et al.  A local learning algorithm for dynamic feedforward and recurrent networks , 1990, Forschungsberichte, TU Munich.

[392]  Yann LeCun,et al.  A theoretical framework for back-propagation , 1988 .

[393]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[394]  Stephen Grossberg,et al.  Adaptive pattern classification and universal recoding: II. Feedback, expectation, olfaction, illusions , 1976, Biological Cybernetics.

[395]  Wolfgang Maass,et al.  Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity , 2013, PLoS Comput. Biol..

[396]  Volkmar Frinken,et al.  Mode Detection in Online Handwritten Documents Using BLSTM Neural Networks , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[397]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[398]  Tobi Delbrück,et al.  CAVIAR: A 45k Neuron, 5M Synapse, 12G Connects/s AER Hardware Sensory–Processing– Learning–Actuating System for High-Speed Visual Object Recognition and Tracking , 2009, IEEE Transactions on Neural Networks.

[399]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[400]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[401]  Alekseĭ Grigorʹevich Ivakhnenko,et al.  CYBERNETIC PREDICTING DEVICES , 1966 .

[402]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[403]  Jürgen Schmidhuber,et al.  My First Deep Learning System of 1991 + Deep Learning Timeline 1962-2013 , 2013, ArXiv.

[404]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[405]  Pierre Baldi,et al.  Hybrid Modeling, HMM/NN Architectures, and Protein Applications , 1996, Neural Computation.

[406]  Jürgen Schmidhuber,et al.  Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks , 2007, NIPS.

[407]  Karl Sims,et al.  Evolving virtual creatures , 1994, SIGGRAPH.

[408]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[409]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[410]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[411]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[412]  Volkmar Frinken,et al.  Keyword Spotting in Online Handwritten Documents Containing Text and Non-text Using BLSTM Neural Networks , 2011, 2011 International Conference on Document Analysis and Recognition.

[413]  Alexander H. Waibel,et al.  The Tempo 2 Algorithm: Adjusting Time-Delays By Supervised Learning , 1990, NIPS.

[414]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[415]  C. G. Broyden A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .

[416]  Christian Osendorfer,et al.  On Fast Dropout and its Applicability to Recurrent Networks , 2013, ICLR.

[417]  Julian Togelius,et al.  The 2009 Simulated Car Racing Championship , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[418]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[419]  Bruce W. Schmeiser,et al.  Improving model accuracy using optimal linear combinations of trained neural networks , 1995, IEEE Trans. Neural Networks.

[420]  Jude W. Shavlik,et al.  Combining the Predictions of Multiple Classifiers: Using Competitive Learning to Initialize Neural Networks , 1995, IJCAI.

[421]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[422]  Jürgen Schmidhuber,et al.  Transfer learning for Latin and Chinese characters with Deep Neural Networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[423]  Jürgen Schmidhuber,et al.  Gödel Machines: Fully Self-referential Optimal Universal Self-improvers , 2007, Artificial General Intelligence.

[424]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[425]  Tom Schaul,et al.  The two-dimensional organization of behavior , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[426]  Aude Billard,et al.  From Animals to Animats , 2004 .

[427]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[428]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[429]  Helge J. Ritter,et al.  Three-dimensional neural net for learning visuomotor coordination of a robot arm , 1990, IEEE Trans. Neural Networks.

[430]  Mark A. Pitt,et al.  Advances in Minimum Description Length: Theory and Applications , 2005 .

[431]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[432]  J. Schmidhuber An 'introspective' network that can learn to run its own weight change algorithm , 1993 .

[433]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[434]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[435]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[436]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[437]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[438]  Astro Teller,et al.  The evolution of mental models , 1994 .

[439]  A. Church An Unsolvable Problem of Elementary Number Theory , 1936 .

[440]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[441]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[442]  Shih-Chii Liu,et al.  Minitaur, an Event-Driven FPGA-Based Spiking Network Accelerator , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[443]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[444]  A. Norman Redlich,et al.  Redundancy Reduction as a Strategy for Unsupervised Learning , 1993, Neural Computation.

[445]  Oren Etzioni,et al.  Explanation-Based Learning: A Problem Solving Perspective , 1989, Artif. Intell..

[446]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[447]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[448]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[449]  F. Pasemann,et al.  Evolving structure and function of neurocontrollers , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[450]  Wofgang Maas,et al.  Networks of spiking neurons: the third generation of neural network models , 1997 .

[451]  Frank Bärmann,et al.  A learning algorithm for multilayered neural networks based on linear least squares problems , 1993, Neural Networks.

[452]  Michael J. Carter,et al.  Operational Fault Tolerance of CMAC Networks , 1989, NIPS.

[453]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[454]  Jordan B. Pollack,et al.  RAAM for infinite context-free languages , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[455]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1952, The Journal of physiology.

[456]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[457]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[458]  Pierre Baldi,et al.  Neural Networks for Fingerprint Recognition , 1993, Neural Computation.

[459]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[460]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[461]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[462]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.

[463]  Jürgen Schmidhuber,et al.  Evolving neural networks in compressed weight space , 2010, GECCO '10.

[464]  Schuster Hg Learning by maximizing the information transfer through nonlinear noisy neurons and "noise breakdown , 1992 .

[465]  Geoffrey E. Hinton,et al.  Keeping Neural Networks Simple , 1993 .

[466]  C. Malsburg Self-organization of orientation sensitive cells in the striate cortex , 2004, Kybernetik.

[467]  Ansgar Heinrich Ludolf West,et al.  Adaptive Back-Propagation in On-Line Learning of Multilayer Networks , 1995, NIPS.

[468]  S. Grossberg,et al.  Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors , 1976, Biological Cybernetics.

[469]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[470]  Anitha Pasupathy,et al.  Transformation of shape information in the ventral pathway , 2007, Current Opinion in Neurobiology.

[471]  Danil V. Prokhorov,et al.  Adaptive behavior with fixed weights in RNN: an overview , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[472]  D. O. Hebb,et al.  The organization of behavior , 1988 .

[473]  S. Linnainmaa Taylor expansion of the accumulated rounding error , 1976 .

[474]  Terence D. Sanger,et al.  An Optimality Principle for Unsupervised Learning , 1988, NIPS.

[475]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[476]  Qingxiang Wu,et al.  A Novel Approach for the Implementation of Large Scale Spiking Neural Networks on FPGA Hardware , 2005, IWANN.

[477]  L. Bobrowski Learning processes in multilayer threshold nets , 1978, Biological Cybernetics.

[478]  Jürgen Schmidhuber,et al.  Sequence Labelling in Structured Domains with Hierarchical Recurrent Neural Networks , 2007, IJCAI.

[479]  J. Nadal,et al.  Nonlinear neurons in the low-noise limit: a factorial code maximizes information transfer Network 5 , 1994 .

[480]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[481]  Inman Harvey,et al.  Evolving Recurrent Dynamical Networks for Robot Control , 1993 .

[482]  M. Gherrity,et al.  A learning algorithm for analog, fully recurrent neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[483]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[484]  C. Malsburg,et al.  How patterned neural connections can be set up by self-organization , 1976, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[485]  Teuvo Kohonen,et al.  Correlation Matrix Memories , 1972, IEEE Transactions on Computers.

[486]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[487]  Tadashi Kondo,et al.  GMDH neural network algorithm using the heuristic self-organization method and its application to the pattern identification problem , 1998, Proceedings of the 37th SICE Annual Conference. International Session Papers.

[488]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[489]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[490]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[491]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[492]  Kunihiko Fukushima,et al.  Increasing robustness against background noise: Visual pattern recognition by a neocognitron , 2011, Neural Networks.

[493]  Barak A. Pearlmutter,et al.  G-maximization: An unsupervised learning procedure for discovering regularities , 1987 .

[494]  Mitsuo Kawato,et al.  Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[495]  Justus H. Piater,et al.  Closed-Loop Learning of Visual Control Policies , 2011, J. Artif. Intell. Res..

[496]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[497]  KD Miller A model for the development of simple cell receptive fields and the ordered arrangement of orientation columns through activity-dependent competition between ON- and OFF-center inputs , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[498]  Tom Schaul,et al.  No more pesky learning rates , 2012, ICML.

[499]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[500]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[501]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[502]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[503]  Jürgen Schmidhuber,et al.  Unsupervised Learning in LSTM Recurrent Neural Networks , 2001, ICANN.

[504]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[505]  Richard C. T. Lee,et al.  PROW: A Step Toward Automatic Program Writing , 1969, IJCAI.

[506]  D. Gabor,et al.  Theory of communication. Part 1: The analysis of information , 1946 .

[507]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[508]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[509]  Jochen J. Steil,et al.  Online reservoir adaptation by intrinsic plasticity for backpropagation-decorrelation and echo state learning , 2007, Neural Networks.

[510]  Jianlin Cheng,et al.  NNcon: improved protein contact map prediction using 2D-recursive neural networks , 2009, Nucleic Acids Res..

[511]  Keiji Tanaka,et al.  Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. , 1994, Journal of neurophysiology.

[512]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[513]  marquis de L'Hospital Analyse des infiniment petits, pour l'intelligence des lignes courbes , 1970 .

[514]  Hans-Georg Zimmermann,et al.  Forecasting with Recurrent Neural Networks: 12 Tricks , 2012, Neural Networks: Tricks of the Trade.

[515]  Jürgen Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[516]  Julian Togelius,et al.  Evolving Memory Cell Structures for Sequence Learning , 2009, ICANN.

[517]  Shimon Whiteson,et al.  Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..

[518]  K. Gödel Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[519]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[520]  M. A. Andrade,et al.  Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network. , 1993, Protein engineering.

[521]  Hiroaki Kitano,et al.  Designing Neural Networks Using Genetic Algorithms with Graph Generation System , 1990, Complex Syst..

[522]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[523]  Elliot Soloway,et al.  Learning to program = learning to construct mechanisms and explanations , 1986, CACM.

[524]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[525]  Gerald DeJong,et al.  Explanation-Based Learning: An Alternative View , 2005, Machine Learning.

[526]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[527]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[528]  Jürgen Schmidhuber,et al.  Semilinear Predictability Minimization Produces Well-Known Feature Detectors , 1996, Neural Computation.

[529]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[530]  L. C. Baird,et al.  Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[531]  Mike Schuster,et al.  On supervised learning from sequential data with applications for speech regognition , 1999 .

[532]  Ian H. Witten,et al.  Stacked generalization: when does it work? , 1997, IJCAI 1997.

[533]  Jürgen Schmidhuber,et al.  Solving POMDPs with Levin Search and EIRA , 1996, ICML.

[534]  D. Zipser,et al.  A spiking network model of short-term active memory , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[535]  Daniele Loiacono,et al.  Simulated Car Racing Championship: Competition Software Manual , 2013, ArXiv.

[536]  Nicolas Brunel,et al.  Dynamics of Sparsely Connected Networks of Excitatory and Inhibitory Spiking Neurons , 2000, Journal of Computational Neuroscience.

[537]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[538]  Christian Jacob,et al.  Genetic L-System Programming , 1994, PPSN.

[539]  J. Rubner,et al.  Development of feature detectors by self-organization , 2004, Biological Cybernetics.

[540]  Mitsuo Kawato,et al.  Neural network control for a closed-loop System using Feedback-error-learning , 1993, Neural Networks.

[541]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[542]  Luca Maria Gambardella,et al.  Learing Fine Motion by Using the Hierarchical Extended Kohonen Map , 1996, ICANN.

[543]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[544]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[545]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[546]  A. P. Wieland,et al.  Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[547]  R. L. Stratonovich CONDITIONAL MARKOV PROCESSES , 1960 .

[548]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[549]  Achilleas Zapranis,et al.  Stock performance modeling using neural networks: A comparative study with regression models , 1994, Neural Networks.

[550]  Pierre Baldi,et al.  The dropout learning algorithm , 2014, Artif. Intell..

[551]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[552]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[553]  John E. Moody,et al.  Fast Pruning Using Principal Components , 1993, NIPS.

[554]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[555]  Jürgen Schmidhuber,et al.  Self-Delimiting Neural Networks , 2012, ArXiv.

[556]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[557]  L. S. Pontryagin,et al.  Mathematical Theory of Optimal Processes , 1962 .

[558]  Stephen F. Smith,et al.  A learning system based on genetic adaptive algorithms , 1980 .

[559]  Gert Cauwenberghs,et al.  A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization , 1992, NIPS.

[560]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[561]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[562]  John F. Kolen,et al.  Field Guide to Dynamical Recurrent Networks , 2001 .

[563]  Tim Curran,et al.  The Limits of Feedforward Vision: Recurrent Processing Promotes Robust Object Recognition when Objects Are Degraded , 2012, Journal of Cognitive Neuroscience.

[564]  A. E. Bryson,et al.  A Steepest-Ascent Method for Solving Optimum Programming Problems , 1962 .

[565]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[566]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[567]  B. McNaughton,et al.  Population dynamics and theta rhythm phase precession of hippocampal place cell firing: A spiking neuron model , 1998, Hippocampus.

[568]  Keechul Jung,et al.  GPU implementation of neural networks , 2004, Pattern Recognit..

[569]  Kyunghyun Cho,et al.  Foundations and Advances in Deep Learning , 2014 .

[570]  Bram Bakker,et al.  Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2003 .

[571]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[572]  Günther Palm,et al.  On the Information Storage Capacity of Local Learning Rules , 1992, Neural Computation.

[573]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[574]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[575]  Pierre Baldi,et al.  The Principled Design of Large-Scale Recursive Neural Network Architectures--DAG-RNNs and the Protein Structure Prediction Problem , 2003, J. Mach. Learn. Res..

[576]  Jeremy Buhler,et al.  Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[577]  Jürgen Schmidhuber,et al.  On Fast Deep Nets for AGI Vision , 2011, AGI.

[578]  P. J. Werbos,et al.  Backpropagation and neurocontrol: a review and prospectus , 1989, International 1989 Joint Conference on Neural Networks.

[579]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[580]  Gomes de Freitas,et al.  Bayesian methods for neural networks , 2000 .

[581]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[582]  Tapani Raiko,et al.  Enhanced Gradient for Training Restricted Boltzmann Machines , 2013, Neural Computation.

[583]  Panagiotis Manolios,et al.  First-Order Recurrent Neural Networks and Deterministic Finite State Automata , 1994, Neural Computation.

[584]  Michael I. Jordan Supervised learning and systems with excess degrees of freedom , 1988 .

[585]  Henry Markram,et al.  Neural Networks with Dynamic Synapses , 1998, Neural Computation.

[586]  Jirí Síma,et al.  Loading Deep Networks Is Hard , 1994, Neural Comput..

[587]  Richard S. Sutton,et al.  GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.

[588]  Osamu Watanabe,et al.  Kolmogorov Complexity and Computational Complexity , 2012, EATCS Monographs on Theoretical Computer Science.

[589]  Jürgen Schmidhuber,et al.  A fast learning algorithm for image segmentation with max-pooling convolutional networks , 2013, 2013 IEEE International Conference on Image Processing.

[590]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[591]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[592]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[593]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[594]  Jürgen Schmidhuber,et al.  Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.

[595]  Faustino J. Gomez,et al.  Intrinsically Motivated Evolutionary Search for Vision-Based Reinforcement Learning , 2011 .

[596]  Christian Igel,et al.  Neuroevolution for reinforcement learning using evolution strategies , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[597]  Helko Lehmann,et al.  Computation in Recurrent Neural Networks: From Counters to Iterated Function Systems , 1998, Australian Joint Conference on Artificial Intelligence.

[598]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[599]  Bruno A. Olshausen,et al.  Inferring Sparse, Overcomplete Image Codes Using an Efficient Coding Framework , 1998, NIPS.

[600]  BattitiRoberto First- and second-order methods for learning , 1992 .

[601]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[602]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[603]  Tao Zhang,et al.  Stable Adaptive Neural Network Control , 2001, The Springer International Series on Asian Studies in Computer and Information Science.

[604]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[605]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[606]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[607]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[608]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[609]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[610]  Geoffrey E. Hinton,et al.  Phone recognition using Restricted Boltzmann Machines , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[611]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[612]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[613]  Tony Plate,et al.  Holographic Recurrent Networks , 1992, NIPS.

[614]  H. Akaike A new look at the statistical model identification , 1974 .

[615]  Roger Fletcher,et al.  A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[616]  Vittorio Maniezzo,et al.  Genetic evolution of the topology and weight distribution of neural networks , 1994, IEEE Trans. Neural Networks.

[617]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[618]  Yoshua Bengio,et al.  Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.

[619]  Robert A. Legenstein,et al.  Neural circuits for pattern recognition with small total wire length , 2002, Theor. Comput. Sci..

[620]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[621]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[622]  Joseph F. Murray,et al.  Convolutional Networks Can Learn to Generate Affinity Graphs for Image Segmentation , 2010, Neural Computation.

[623]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[624]  Luís B. Almeida,et al.  A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[625]  Yoshua Bengio,et al.  Artificial neural networks and their application to sequence recognition , 1991 .

[626]  Andrés Pérez Uribe,et al.  Structure-Adaptable Digital Neural Networks , 1999 .

[627]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[628]  Reinhard Männer,et al.  Multiprocessor And Memory Architecture Of The Neurocomputer Synapse-1 , 1993, Int. J. Neural Syst..

[629]  Dario Floreano,et al.  Hardware spiking neural network with run-time reconfigurable connectivity in an autonomous robot , 2003, NASA/DoD Conference on Evolvable Hardware, 2003. Proceedings..

[630]  Dario Floreano,et al.  From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior , 2000, Journal of Cognitive Neuroscience.

[631]  D. Mackay,et al.  Analysis of Linsker's application of Hebbian rules to linear networks , 1990 .

[632]  John E. Moody,et al.  Fast Learning in Multi-Resolution Hierarchies , 1988, NIPS.

[633]  A. K. Rigler,et al.  Accelerating the convergence of the back-propagation method , 1988, Biological Cybernetics.

[634]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[635]  Schuster,et al.  Separation of a mixture of independent signals using time delayed correlations. , 1994, Physical review letters.

[636]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[637]  Jürgen Schmidhuber,et al.  Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[638]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[639]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[640]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[641]  Sebastian Otte,et al.  Local Feature Based Online Mode Detection with Recurrent Neural Networks , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[642]  Jude W. Shavlik,et al.  Using knowledge-based neural networks to improve algorithms: Refining the Chou-Fasman algorithm for protein folding , 2004, Machine Learning.

[643]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[644]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[645]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[646]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[647]  H. B. Barlow,et al.  Finding Minimum Entropy Codes , 1989, Neural Computation.

[648]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[649]  Wolfgang Maass,et al.  Lower Bounds for the Computational Power of Networks of Spiking Neurons , 1996, Neural Computation.

[650]  K S Narendra,et al.  Control of nonlinear dynamical systems using neural networks. II. Observability, identification, and control , 1996, IEEE Trans. Neural Networks.

[651]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[652]  R. Vaillant,et al.  An original approach for the localization of objects in images , 1993 .

[653]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[654]  Tapani Raiko,et al.  Tikhonov-Type Regularization for Restricted Boltzmann Machines , 2012, ICANN.

[655]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[656]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[657]  Alekseĭ Grigorʹevich Ivakhnenko,et al.  Cybernetics and forecasting techniques , 1967 .

[658]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[659]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[660]  Bernard Widrow,et al.  Neural networks: applications in industry, business and science , 1994, CACM.

[661]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[662]  David Windisch Loading Deep Networks Is Hard: The Pyramidal Case , 2005, Neural Computation.

[663]  Tobi Delbruck,et al.  Real-time classification and sensor fusion with a spiking deep belief network , 2013, Front. Neurosci..

[664]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[665]  Suzanna Becker,et al.  Unsupervised Learning Procedures for Neural Networks , 1991, Int. J. Neural Syst..

[666]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[667]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[668]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[669]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[670]  Anton Gunzinger,et al.  Fast neural net simulation with a DSP processor array , 1995, IEEE Trans. Neural Networks.

[671]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[672]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[673]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[674]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[675]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[676]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[677]  Mohammed Bennamoun,et al.  Automatic Feature Learning for Robust Shadow Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[678]  Jürgen Schmidhuber Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability , 1995, ICML.

[679]  Christian W. Omlin,et al.  A Machine Learning Method for Extracting Symbolic Knowledge from Recurrent Neural Networks , 2004, Neural Computation.

[680]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[681]  Guo-Zheng Sun,et al.  Time Warping Invariant Neural Networks , 1992, NIPS.

[682]  Gerhard Weiß,et al.  Hierarchical Chunking in Classifier Systems , 1994, AAAI.

[683]  Danil V. Prokhorov,et al.  A Convolutional Learning System for Object Classification in 3-D Lidar Data , 2010, IEEE Transactions on Neural Networks.

[684]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[685]  Frank Fallside,et al.  Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .

[686]  Bart L. M. Happel,et al.  Design and evolution of modular neural network architectures , 1994, Neural Networks.

[687]  Wolfgang Maass,et al.  On the Computational Power of Winner-Take-All , 2000, Neural Computation.

[688]  HighWire Press Philosophical Transactions of the Royal Society of London , 1781, The London Medical Journal.

[689]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[690]  Maneesh Sahani,et al.  Regularization and nonlinearities for neural language models: when are they needed? , 2013, ArXiv.

[691]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[692]  Ha Hong,et al.  Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream , 2013, NIPS.

[693]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[694]  A. Lindenmayer Mathematical models for cellular interactions in development. I. Filaments with one-sided inputs. , 1968, Journal of theoretical biology.

[695]  Radford M. Neal Classification with Bayesian Neural Networks , 2005, MLCW.

[696]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[697]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[698]  Risto Miikkulainen,et al.  Evolving Keepaway Soccer Players through Task Decomposition , 2003, GECCO.

[699]  Andreas Rauber,et al.  The growing hierarchical self-organizing map , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[700]  Alan J. Gross,et al.  Self-Organizing Methods in Modeling , 1988 .

[701]  George M. Siouris,et al.  Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.