论文信息 - Deep Learning

Deep Learning

Machine-learning technology powers many aspects of modern society: from web searches to content filtering on social networks to recommendations on e-commerce websites, and it is increasingly present in consumer products such as cameras and smartphones. Machine-learning systems are used to identify objects in images, transcribe speech into text, match news items, posts or products with users’ interests, and select relevant results of search. Increasingly, these applications make use of a class of techniques called deep learning. Conventional machine-learning techniques were limited in their ability to process natural data in their raw form. For decades, constructing a pattern-recognition or machine-learning system required careful engineering and considerable domain expertise to design a feature extractor that transformed the raw data (such as the pixel values of an image) into a suitable internal representation or feature vector from which the learning subsystem, often a classifier, could detect or classify patterns in the input. Representation learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification. Deep-learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level. With the composition of enough such transformations, very complex functions can be learned. For classification tasks, higher layers of representation amplify aspects of the input that are important for discrimination and suppress irrelevant variations. An image, for example, comes in the form of an array of pixel values, and the learned features in the first layer of representation typically represent the presence or absence of edges at particular orientations and locations in the image. The second layer typically detects motifs by spotting particular arrangements of edges, regardless of small variations in the edge positions. The third layer may assemble motifs into larger combinations that correspond to parts of familiar objects, and subsequent layers would detect objects as combinations of these parts. The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure. Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years. It has turned out to be very good at discovering intricate structures in high-dimensional data and is therefore applicable to many domains of science, business and government. In addition to beating records in image recognition and speech recognition, it has beaten other machine-learning techniques at predicting the activity of potential drug molecules, analysing particle accelerator data, reconstructing brain circuits, and predicting the effects of mutations in non-coding DNA on gene expression and disease. Perhaps more surprisingly, deep learning has produced extremely promising results for various tasks in natural language understanding, particularly topic classification, sentiment analysis, question answering and language translation. We think that deep learning will have many more successes in the near future because it requires very little engineering by hand, so it can easily take advantage of increases in the amount of available computation and data. New learning algorithms and architectures that are currently being developed for deep neural networks will only accelerate this progress.

[1] O. Perron. Zur Theorie der Matrices , 1907 .

[2] Student,et al. THE PROBABLE ERROR OF A MEAN , 1908 .

[3] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4] Kenneth Levenberg. A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[5] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[6] Robert Price,et al. A useful theorem for nonlinear devices having Gaussian inputs , 1958, IRE Trans. Inf. Theory.

[7] D. Hubel,et al. Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[8] J. B. Rosen. The Gradient Projection Method for Nonlinear Programming. Part I. Linear Constraints , 1960 .

[9] Henry J. Kelley,et al. Gradient Theory of Optimal Flight Paths , 1960 .

[10] J. B. Rosen. The gradient projection method for nonlinear programming: Part II , 1961 .

[11] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[12] S. Dreyfus. The numerical solution of variational problems , 1962 .

[13] A. E. Bryson,et al. A Steepest-Ascent Method for Solving Optimum Programming Problems , 1962 .

[14] A. A. Mullin,et al. Principles of neurodynamics , 1962 .

[15] D. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[16] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .

[17] G. Bonnet. Transformations des signaux aléatoires a travers les systèmes non linéaires sans mémoire , 1964 .

[18] D. Hubel,et al. Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[19] marquis de L'Hospital. Analyse des infiniment petits, pour l'intelligence des lignes courbes , 1970 .

[20] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[21] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[22] Roger M. Needham,et al. Note on evaluation , 1973, Inf. Storage Retr..

[23] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[24] J. Besag. Statistical Analysis of Non-Lattice Data , 1975 .

[25] T. Zaslavsky. Facing Up to Arrangements: Face-Count Formulas for Partitions of Space by Hyperplanes , 1975 .

[26] D Marr,et al. Cooperative computation of stereo disparity. , 1976, Science.

[27] S. Linnainmaa. Taylor expansion of the accumulated rounding error , 1976 .

[28] 丸山徹. Convex Analysisの二,三の進展について , 1977 .

[29] George M. Siouris,et al. Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[30] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .

[31] Kunihiko Fukushima,et al. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[32] Paul J. Werbos,et al. Applications of advances in nonlinear sensitivity analysis , 1982 .

[33] G. Lakoff,et al. Metaphors We Live by , 1982 .

[34] Francis Crick,et al. The function of dream sleep , 1983, Nature.

[35] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[36] Geoffrey E. Hinton,et al. Massively Parallel Architectures for AI: NETL, Thistle, and Boltzmann Machines , 1983, AAAI.

[37] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[38] James R. Wilson. Variance Reduction Techniques for Digital Simulation , 1984 .

[39] D. Rubin. Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[40] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[41] Geoffrey E. Hinton,et al. Symbols Among the Neurons: Details of a Connectionist Inference Architecture , 1985, IJCAI.

[42] N. J. Cohen,et al. Higher-Order Boltzmann Machines , 1986 .

[43] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[44] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[45] Geoffrey E. Hinton,et al. The appeal of parallel distributed processing , 1986 .

[46] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .

[47] Yann LeCun,et al. Learning processes in an asymmetric threshold network , 1986 .

[48] Johan Håstad,et al. Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[49] L. Devroye. Non-Uniform Random Variate Generation , 1986 .

[50] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .

[51] Pavel Pudlák,et al. Threshold circuits of bounded depth , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[52] Geoffrey E. Hinton,et al. Learning Representations by Recirculation , 1987, NIPS.

[53] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[54] Y. T. Zhou,et al. Computation of optical flow using a neural network , 1988, IEEE 1988 International Conference on Neural Networks.

[55] Lalit R. Bahl,et al. Speech recognition with continuous-parameter hidden Markov models , 1987, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[56] Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[57] Yann LeCun,et al. Generalization and network design strategies , 1989 .

[58] P. Foldiak,et al. Adaptive network for optimal linear feature extraction , 1989, International 1989 Joint Conference on Neural Networks.

[59] Eduardo D. Sontag,et al. Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..

[60] L.D. Jackel,et al. Analog electronic neural network circuits , 1989, IEEE Circuits and Devices Magazine.

[61] Mohammed Ismail,et al. Analog VLSI Implementation of Neural Systems , 2011, The Kluwer International Series in Engineering and Computer Science.

[62] I. Guyon,et al. Handwritten digit recognition: applications of neural network chips and automatic learning , 1989, IEEE Communications Magazine.

[63] R. Solomonoff. A SYSTEM FOR INCREMENTAL LEARNING BASED ON ALGORITHMIC PROBABILITY , 1989 .

[64] Françoise Fogelman-Soulié,et al. Experiments with time delay networks and dynamic time warping for speaker independent isolated digits recognition , 1989, EUROSPEECH.

[65] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[66] J. Slawny,et al. Back propagation fails to separate where perceptrons succeed , 1989 .

[67] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[68] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[69] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[70] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .

[71] Hervé Bourlard,et al. Speech pattern discrimination and multilayer perceptrons , 1989 .

[72] Geoffrey E. Hinton. Connectionist Learning Procedures , 1989, Artif. Intell..

[73] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[74] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[75] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[76] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[77] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[78] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[79] Geoffrey E. Hinton,et al. Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[80] Kurt Hornik,et al. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[81] Jordan B. Pollack,et al. Recursive Distributed Representations , 1990, Artif. Intell..

[82] Geoffrey E. Hinton. Mapping Part-Whole Hierarchies into Connectionist Networks , 1990, Artif. Intell..

[83] S. Mor-Yosef,et al. Ranking the Risk Factors for Cesarean: Logistic Regression Analysis of a Nationwide Study , 1990, Obstetrics and gynecology.

[84] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.

[85] J. Stephen Judd,et al. Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[86] John S. Bridle,et al. Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..

[87] Ramanathan V. Guha,et al. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[88] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[89] Christian Jutten,et al. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[90] Risto Miikkulainen,et al. Natural Language Processing With Modular PDP Networks and Distributed Lexicon , 1991, Cogn. Sci..

[91] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[92] B. Wolf. The Machine That Changed the World , 1991 .

[93] Yoshua Bengio,et al. Neural Network - Gaussian Mixture Hybrid for Speech Recognition or Density Estimation , 1991, NIPS.

[94] Eduardo Sontag,et al. Turing computability with neural nets , 1991 .

[95] Frank Fallside,et al. A recurrent error propagation network speech recognition system , 1991 .

[96] Geoffrey E. Hinton,et al. Lesioning an attractor network: investigations of acquired dyslexia. , 1991, Psychological review.

[97] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[98] Lawrence D. Jackel,et al. An analog neural network processor with programmable topology , 1991 .

[99] J. L. Holt,et al. Back propagation simulations using limited precision calculations , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[100] D. J. Felleman,et al. Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[101] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[102] Yoshua Bengio,et al. Artificial neural networks and their application to sequence recognition , 1991 .

[103] Michael C. Mozer,et al. Induction of Multiscale Temporal Structure , 1991, NIPS.

[104] Jocelyn Sietsma,et al. Creating artificial neural networks that generalize , 1991, Neural Networks.

[105] Yann LeCun,et al. Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[106] Dinh Tuan Pham,et al. Separation of a mixture of independent sources through a maximum likelihood approach , 1992 .

[107] Saul B. Gelfand,et al. Classification trees with neural network feature extraction , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[108] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[109] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.

[110] Geoffrey E. Hinton,et al. Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[111] Alberto Tesi,et al. On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[112] E. Capaldi,et al. The organization of behavior. , 1992, Journal of applied behavior analysis.

[113] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[114] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[115] Bruce Christianson,et al. Automatic Hessians by reverse accumulation , 1992 .

[116] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .

[117] Yoshua Bengio,et al. Phonetically motivated acoustic parameters for continuous speech recognition using artificial neural networks , 1991, Speech Commun..

[118] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[119] Patrice Marcotte,et al. Novel approaches to the discrimination problem , 1992, ZOR Methods Model. Oper. Res..

[120] C. R. Rao,et al. Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[121] Geoffrey E. Hinton,et al. Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[122] Saul B. Gelfand,et al. Classification trees with neural network feature extraction , 1992, IEEE Trans. Neural Networks.

[123] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[124] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[125] Martin Fodslette Møller,et al. A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[126] Geoffrey E. Hinton,et al. Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[127] Yoshua Bengio,et al. The problem of learning long-term dependencies in recurrent networks , 1993, IEEE International Conference on Neural Networks.

[128] Jenq-Neng Hwang,et al. Finite Precision Error Analysis of Neural Network Hardware Implementations , 1993, IEEE Trans. Computers.

[129] D. V. van Essen,et al. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[130] Patrice Y. Simard,et al. Backpropagation without Multiplication , 1993, NIPS.

[131] Kenji Doya,et al. Bifurcations of Recurrent Neural Networks in Gradient Descent Learning , 1993 .

[132] Hermann Ney,et al. Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[133] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[134] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[135] Wolfgang Maass,et al. Bounds for the computational power and learning complexity of analog neural nets , 1993, SIAM J. Comput..

[136] R. Vaillant,et al. Original approach for the localisation of objects in images , 1994 .

[137] Pierre L'Ecuyer,et al. Efficiency improvement and variance reduction , 1994, Proceedings of Winter Simulation Conference.

[138] Terence D. Sanger,et al. Neural network learning control of robot manipulators using gradually increasing task difficulty , 1994, IEEE Trans. Robotics Autom..

[139] Clark S. Lindsey,et al. Review of hardware neural networks: A User's perspective , 1994 .

[140] Eduardo Sontag,et al. A Comparison of the Computational Power of Sigmoid and Boolean Threshold Circuits , 1994 .

[141] Sebastian Thrun,et al. Learning to Play the Game of Chess , 1994, NIPS.

[142] Jürgen Schmidhuber,et al. Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.

[143] G. Kane. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[144] Schuster,et al. Separation of a mixture of independent signals using time delayed correlations. , 1994, Physical review letters.

[145] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..

[146] Geoffrey E. Hinton,et al. Recognizing Handwritten Digits Using Mixtures of Linear Models , 1994, NIPS.

[147] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[148] R. L. Haggard,et al. A fixed point implementation of the backpropagation learning algorithm , 1994, Proceedings of SOUTHEASTCON '94.

[149] S. Srihari. Mixture Density Networks , 1994 .

[150] Peter Tiňo,et al. Learning long-term dependencies is not as difficult with NARX recurrent neural networks , 1995 .

[151] Christopher M. Bishop,et al. Regularization and complexity control in feed-forward networks , 1995 .

[152] L. Ljung,et al. Overtraining, regularization and searching for a minimum, with application to neural networks , 1995 .

[153] Carl E. Rasmussen,et al. In Advances in Neural Information Processing Systems , 2011 .

[154] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[155] J. E. Jackson,et al. Statistical Factor Analysis and Related Methods: Theory and Applications , 1995 .

[156] J. J. Moré,et al. Global continuation for distance geometry problems , 1995 .

[157] Geoffrey E. Hinton,et al. The Helmholtz Machine , 1995, Neural Computation.

[158] Geoffrey E. Hinton,et al. The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[159] Eduard Aved’yan,et al. Multilayer Neural Networks , 1995 .

[160] Jonathan Baxter,et al. Learning internal representations , 1995, COLT '95.

[161] Hava T. Siegelmann,et al. On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[162] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[163] S. Mase,et al. Consistency of the Maximum Pseudo-Likelihood Estimator of Continuous State Space Gibbsian Processes , 1995 .

[164] H T Siegelmann,et al. Dating and Context of Three Middle Stone Age Sites with Bone Points in the Upper Semliki Valley, Zaire , 2007 .

[165] Christopher M. Bishop,et al. Current address: Microsoft Research, , 2022 .

[166] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[167] Michael I. Jordan,et al. Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[168] Yochai Konig,et al. REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities - Application to Transition-Based Connectionist Speech Recognition , 1995, NIPS.

[169] C. Lee Giles,et al. An analysis of noise in recurrent neural networks: convergence and generalization , 1996, IEEE Trans. Neural Networks.

[170] Geoffrey E. Hinton,et al. The EM algorithm for mixtures of factor analyzers , 1996 .

[171] Michael I. Jordan,et al. Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[172] Yann LeCun,et al. Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[173] Heikki Hyotyniemi,et al. Turing Machines Are Recurrent Neural Networks , 1996 .

[174] Radford M. Neal. Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[175] San Cristóbal Mateo,et al. The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[176] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[177] Geoffrey E. Hinton,et al. Varieties of Helmholtz Machine , 1996, Neural Networks.

[178] Jürgen Schmidhuber,et al. Sequential neural text compression , 1996, IEEE Trans. Neural Networks.

[179] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[180] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[181] Yoav Freund,et al. Game theory, on-line prediction and boosting , 1996, COLT '96.

[182] Brian Kingsbury,et al. Spert-II: A Vector Microprocessor System , 1996, Computer.

[183] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[184] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[185] B. Sparkes. The Red and the Black: Studies in Greek Pottery , 1996 .

[186] Yoshua Bengio,et al. Training Methods for Adaptive Boosting of Neural Networks , 1997, NIPS.

[187] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[188] Geoffrey E. Hinton,et al. Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[189] Geoffrey E. Hinton,et al. Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[190] David H. Wolpert,et al. No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[191] Ah Chung Tsoi,et al. Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[192] Alessandro Sperduti,et al. On the Efficient Classification of Data Structures by Neural Networks , 1997, IJCAI.

[193] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[194] C. Jarzynski. Nonequilibrium Equality for Free Energy Differences , 1996, cond-mat/9610209.

[195] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[196] George Trapp,et al. Using Complex Variables to Estimate Derivatives of Real Functions , 1998, SIAM Rev..

[197] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[198] Eduardo Sontag. VC dimension of neural networks , 1998 .

[199] Brendan J. Frey,et al. Graphical Models for Machine Learning and Digital Communication , 1998 .

[200] D. Simons,et al. Failure to detect changes to people during a real-world interaction , 1998 .

[201] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[202] Alessandro Sperduti,et al. A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[203] Georges Raepsaet. David W. Tandy & Walter C. Neale, Hesiod' s Works and Days. A Translation and Commentary for the Social Sciences , 1998 .

[204] Alexander J. Smola,et al. Learning with kernels , 1998 .

[205] Aapo Hyvärinen,et al. Emergence of Topography and Complex Cell Properties from Natural Images using Extensions of ICA , 1999, NIPS.

[206] Aapo Hyvärinen,et al. Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.

[207] Samy Bengio,et al. Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks , 1999, NIPS.

[208] Aapo Hyvärinen,et al. Survey on Independent Component Analysis , 1999 .

[209] Giovanni Soda,et al. Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[210] B. Schölkopf,et al. Advances in kernel methods: support vector learning , 1999 .

[211] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[212] L. Younes. On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[213] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[214] Terrence J. Sejnowski,et al. Unsupervised Learning , 2018, Encyclopedia of GIS.

[215] Mike Schuster,et al. On supervised learning from sequential data with applications for speech regognition , 1999 .

[216] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[217] Samy Bengio,et al. Taking on the curse of dimensionality in joint distributions using neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[218] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[219] Shaogang Gong,et al. Dynamic Vision - From Images to Face Recognition , 2000 .

[220] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[221] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[222] Juha Karhunen,et al. Nonlinear Independent Component Analysis Using Ensemble Learning: Experiments and Discussion , 2000 .

[223] Geoffrey E. Hinton,et al. Extracting distributed representations of concepts and relations from positive and negative propositions , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[224] M. Sur,et al. Visual behaviour mediated by retinal projections directed to the auditory pathway , 2000, Nature.

[225] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[226] Yoshua Bengio,et al. Série Scientifique Scientific Series Incorporating Second-order Functional Knowledge for Better Option Pricing Incorporating Second-order Functional Knowledge for Better Option Pricing , 2022 .

[227] Yoshua Bengio,et al. Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[228] Sven Behnke,et al. Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid , 2001, Int. J. Comput. Intell. Appl..

[229] Radford M. Neal. Annealed importance sampling , 1998, Stat. Comput..

[230] Joshua Goodman,et al. Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[231] Paul A. Viola,et al. Robust Real-time Object Detection , 2001 .

[232] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[233] Geoffrey E. Hinton,et al. Global Coordination of Local Linear Models , 2001, NIPS.

[234] DeLiang Wang,et al. Unsupervised Learning: Foundations of Neural Computation , 2001, AI Mag..

[235] Aapo Hyvärinen,et al. Topographic Independent Component Analysis , 2001, Neural Computation.

[236] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[237] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[238] Yukito Iba. EXTENDED ENSEMBLE MONTE CARLO , 2001 .

[239] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[240] Yee Whye Teh,et al. A New View of ICA , 2001 .

[241] Refractor. Metamorphoses , 1868, The Lancet.

[242] Stan Lipovetsky,et al. Latent Variable Models and Factor Analysis , 2001, Technometrics.

[243] Geoffrey E. Hinton,et al. Self Supervised Boosting , 2002, NIPS.

[244] Geoffrey E. Hinton,et al. Learning Sparse Topographic Representations with Products of Student-t Distributions , 2002, NIPS.

[245] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.

[246] Samy Bengio,et al. A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[247] K. I. WilliamsDivision,et al. Products of Gaussians and Probabilistic Minor Component Analysis , 2002, Neural Computation.

[248] Herbert Jaeger,et al. Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[249] Pascal Vincent,et al. Manifold Parzen Windows , 2002, NIPS.

[250] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[251] Bernhard Schölkopf,et al. Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[252] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[253] F. Huang,et al. Generalized Pseudo-Likelihood Estimates for Markov Random Fields on Lattice , 2002 .

[254] Matthew Brand,et al. Charting a Manifold , 2002, NIPS.

[255] Feng-Hsiung Hsu,et al. Behind Deep Blue: Building the Computer that Defeated the World Chess Champion , 2002 .

[256] Henry Markram,et al. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[257] Jean-Luc Gauvain,et al. Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[258] Yoshua Bengio,et al. No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[259] James Henderson,et al. Inducing History Representations for Broad Coverage Statistical Parsing , 2003, NAACL.

[260] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[261] Tony R. Martinez,et al. The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[262] Blockin Blockin,et al. Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003 .

[263] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[264] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[265] D. Donoho,et al. Hessian Eigenmaps : new locally linear embedding techniques for high-dimensional data , 2003 .

[266] Yee Whye Teh,et al. Energy-Based Models for Sparse Overcomplete Representations , 2003, J. Mach. Learn. Res..

[267] William S. Rayens,et al. Independent Component Analysis: Principles and Practice , 2003, Technometrics.

[268] Valeriu Beiu,et al. VLSI implementations of threshold logic-a comprehensive survey , 2003, IEEE Trans. Neural Networks.

[269] Kunihiko Fukushima,et al. Cognitron: A self-organizing multilayered neural network , 1975, Biological Cybernetics.

[270] Harald Haas,et al. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[271] James Henderson,et al. Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[272] Geoffrey E. Hinton,et al. Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[273] Geoffrey E. Hinton,et al. Neighbourhood Components Analysis , 2004, NIPS.

[274] G. Peterson. A day of great illumination: B. F. Skinner's discovery of shaping. , 2004, Journal of the experimental analysis of behavior.

[275] Dario L. Ringach,et al. Reverse correlation in neurophysiology , 2004, Cogn. Sci..

[276] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[277] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[278] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[279] Christophe Garcia,et al. Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[280] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[281] Yann LeCun,et al. Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[282] Kilian Q. Weinberger,et al. Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[283] James L. McClelland,et al. Semantic Cognition: A Parallel Distributed Processing Approach , 2004 .

[284] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[285] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[286] Yoshua Bengio,et al. Non-Local Manifold Tangent Learning , 2004, NIPS.

[287] Kilian Q. Weinberger,et al. Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, CVPR.

[288] H. Bourlard,et al. Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[289] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[290] M. Tribus,et al. Probability theory: the logic of science , 2003 .

[291] Lawrence Cayton,et al. Algorithms for manifold learning , 2005 .

[292] H. Inayoshi,et al. Improved Generalization by Adding both Auto-Association and Hidden-Layer-Noise to Neural-Network-Based-Classifiers , 2005, 2005 IEEE Workshop on Machine Learning for Signal Processing.

[293] David J. Field,et al. How Close Are We to Understanding V1? , 2005, Neural Computation.

[294] Radford M. Neal. Estimating Ratios of Normalizing Constants Using Linked Importance Sampling , 2005, math/0511216.

[295] Laurenz Wiskott,et al. Slow feature analysis yields a rich repertoire of complex cell properties. , 2005, Journal of vision.

[296] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[297] Paola Velardi,et al. Structural semantic interconnections: a knowledge-based approach to word sense disambiguation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[298] Johan Håstad,et al. On the power of small-depth threshold circuits , 1991, computational complexity.

[299] C. Koch,et al. Invariant visual representation by single neurons in the human brain , 2005, Nature.

[300] Eero P. Simoncelli,et al. Spatiotemporal Elements of Macaque V1 Receptive Fields , 2005, Neuron.

[301] Nicolas Le Roux,et al. Convex Neural Networks , 2005, NIPS.

[302] Adi Shraibman,et al. Rank, Trace-Norm and Max-Norm , 2005, COLT.

[303] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[304] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[305] Geoffrey E. Hinton. What kind of graphical model is the brain? , 2005, IJCAI.

[306] Thomas P. Minka,et al. Divergence measures and message passing , 2005 .

[307] Pascal Vincent,et al. Non-Local Manifold Parzen Windows , 2005, NIPS.

[308] Nicolas Le Roux,et al. The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[309] Yann LeCun,et al. Toward automatic phenotyping of developing embryos from videos , 2005, IEEE Transactions on Image Processing.

[310] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[311] Miguel Á. Carreira-Perpiñán,et al. On Contrastive Divergence Learning , 2005, AISTATS.

[312] Patrice Y. Simard,et al. Using GPUs for machine learning algorithms , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[313] Marta R. Costa-jussà,et al. Continuous space language models for the IWSLT 2006 task , 2006, IWSLT.

[314] Patrice Y. Simard,et al. High Performance Convolutional Neural Networks for Document Processing , 2006 .

[315] Rich Caruana,et al. Model compression , 2006, KDD '06.

[316] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[317] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[318] Geoffrey E. Hinton,et al. Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[319] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .

[320] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[321] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[322] Marc'Aurelio Ranzato,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[323] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[324] Kaare Brandt Petersen,et al. The Matrix Cookbook , 2006 .

[325] Holger Schwenk,et al. Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.

[326] Rajat Raina,et al. Efficient sparse coding algorithms , 2006, NIPS.

[327] Tom Minka,et al. Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[328] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..

[329] Max Welling Donald,et al. Products of Experts , 2007 .

[330] Geoffrey E. Hinton. Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[331] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[332] Aapo Hyvärinen,et al. Connections Between Score Matching, Contrastive Divergence, and Pseudolikelihood for Continuous-Valued Variables , 2007, IEEE Transactions on Neural Networks.

[333] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.

[334] Geoffrey E. Hinton,et al. Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[335] James Bennett,et al. The Netflix Prize , 2007 .

[336] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[337] Marc'Aurelio Ranzato,et al. Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[338] Geoffrey E. Hinton,et al. To recognize shapes, first learn to generate images. , 2007, Progress in brain research.

[339] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[340] Jürgen Schmidhuber,et al. Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks , 2007, NIPS.

[341] Yann LeCun,et al. Online Learning for Offroad Robots: Spatial Label Propagation to Learn Long-Range Traversability , 2007, Robotics: Science and Systems.

[342] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .

[343] Geoffrey E. Hinton,et al. Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[344] Laurenz Wiskott,et al. Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells , 2007, PLoS Comput. Biol..

[345] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[346] Herbert Jaeger,et al. Optimization and applications of echo state networks with leaky- integrator neurons , 2007, Neural Networks.

[347] Geoffrey E. Hinton,et al. Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[348] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.

[349] Shawki Areibi,et al. The Impact of Arithmetic Representation on Implementing MLP-BP on FPGAs: A Study , 2007, IEEE Transactions on Neural Networks.

[350] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[351] Aapo Hyvärinen,et al. Some extensions of score matching , 2007, Comput. Stat. Data Anal..

[352] Ruslan Salakhutdinov,et al. Probabilistic Matrix Factorization , 2007, NIPS.

[353] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[354] Geoffrey E. Hinton,et al. Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[355] Joseph F. Murray,et al. Supervised Learning of Image Restoration with Convolutional Networks , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[356] Geoffrey E. Hinton,et al. The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[357] Ruslan Salakhutdinov,et al. On the quantitative analysis of deep belief networks , 2008, ICML '08.

[358] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[359] Jim Hefferon,et al. Linear Algebra , 2012 .

[360] Geoffrey E. Hinton,et al. Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[361] Nicolas Le Roux,et al. Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[362] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[363] Sunita Sarawagi. Learning with Graphical Models , 2008 .

[364] Nicolas Pinto,et al. Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[365] Yoshua Bengio,et al. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.

[366] Tijmen Tieleman,et al. Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[367] D. Lizotte. Practical bayesian optimization , 2008 .

[368] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[369] Antonio Torralba,et al. Spectral Hashing , 2008, NIPS.

[370] Yoshua Bengio,et al. Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[371] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[372] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[373] T. Munich,et al. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[374] David M. Bradley,et al. Differentiable Sparse Coding , 2008, NIPS.

[375] Niko Wilbert,et al. Invariant Object Recognition with Slow Feature Analysis , 2008, ICANN.

[376] Herbert Jaeger,et al. Discovering multiscale dynamical features with hierarchical Echo State Networks , 2008 .

[377] Uwe Naumann,et al. Optimal Jacobian accumulation is NP-complete , 2007, Math. Program..

[378] Yoshua Bengio,et al. Zero-data Learning of New Tasks , 2008, AAAI.

[379] Geoffrey E. Hinton,et al. Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[380] Yoshua Bengio,et al. Slow, Decorrelated Features for Pretraining Complex Cell-like Networks , 2009, NIPS.

[381] Yoshua Bengio,et al. Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[382] H. Sebastian Seung,et al. Maximin affinity learning of image segmentation , 2009, NIPS.

[383] Kunle Olukotun,et al. A highly scalable Restricted Boltzmann Machine FPGA implementation , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[384] Geoffrey E. Hinton,et al. Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[385] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[386] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[387] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[388] Yann LeCun,et al. Learning long‐range vision for autonomous off‐road driving , 2009, J. Field Robotics.

[389] Quoc V. Le,et al. Measuring Invariances in Deep Networks , 2009, NIPS.

[390] Geoffrey E. Hinton,et al. Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[391] Aapo Hyvärinen,et al. Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[392] J. Schmidhuber,et al. A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[393] R. Fergus,et al. Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[394] Geoffrey E. Hinton,et al. 3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[395] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[396] Siwei Lyu,et al. Interpretation and Generalization of Score Matching , 2009, UAI.

[397] Herbert Jaeger,et al. Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..

[398] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.

[399] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[400] M. Del Giudice,et al. Programmed to learn? The ontogeny of mirror neurons. , 2009, Developmental science.

[401] Yehuda Koren,et al. The BellKor Solution to the Netflix Grand Prize , 2009 .

[402] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[403] Manfred Opper,et al. The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[404] Yoshua Bengio,et al. Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[405] Ruslan Salakhutdinov,et al. Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[406] Pascal Vincent,et al. The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[407] Geoffrey E. Hinton,et al. Semantic hashing , 2009, Int. J. Approx. Reason..

[408] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .

[409] Kai A. Krueger,et al. Flexible shaping: How learning in small steps helps , 2009, Cognition.

[410] Hossein Mobahi,et al. Deep learning from temporal coherence in video , 2009, ICML '09.

[411] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[412] Geoffrey E. Hinton,et al. Deep Belief Networks for phone recognition , 2009 .

[413] Hugo Larochelle,et al. Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[414] Quoc V. Le,et al. Tiled convolutional neural networks , 2010, NIPS.

[415] Geoffrey E. Hinton,et al. Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[416] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[417] Dong Yu,et al. Sequential Labeling Using Deep-Structured Conditional Random Fields , 2010, IEEE Journal of Selected Topics in Signal Processing.

[418] Geoffrey E. Hinton,et al. Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[419] Yann LeCun,et al. Emergence of Complex-Like Cells in a Temporal Product Network with Local Receptive Fields , 2010, ArXiv.

[420] Indranil Saha,et al. journal homepage: www.elsevier.com/locate/neucom , 2022 .

[421] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[422] Valentin I. Spitkovsky,et al. From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing , 2010, NAACL.

[423] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[424] Nicolas Le Roux,et al. Deep Belief Networks Are Compact Universal Approximators , 2010, Neural Computation.

[425] Nicolas Le Roux,et al. The Learning Workshop Snowbird , 2010 .

[426] Ilya Sutskever,et al. On the Convergence Properties of Contrastive Divergence , 2010, AISTATS.

[427] Bo Chen,et al. Deep Learning of Invariant Spatio-Temporal Features from Video , 2010 .

[428] Julian Eggert,et al. Binary Sparse Coding , 2010, LVA/ICA.

[429] Yann LeCun,et al. Regularized estimation of image statistics by Score Matching , 2010, NIPS.

[430] Tapani Raiko,et al. Parallel tempering is efficient for learning restricted Boltzmann machines , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[431] Geoffrey E. Hinton,et al. Generating more realistic images using gated MRF's , 2010, NIPS.

[432] Pascal Vincent,et al. Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[433] Geoffrey E. Hinton,et al. Learning to Detect Roads in High-Resolution Aerial Images , 2010, ECCV.

[434] Y-Lan Boureau,et al. Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[435] Geoffrey E. Hinton,et al. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[436] Fei-Fei Li,et al. What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[437] Daphne Koller,et al. Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[438] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[439] J. G. Garson,et al. The metric system of identification of criminals, as used in Great Britain and Ireland. , 2010 .

[440] Jason Weston,et al. Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[441] Martin Pál,et al. Contextual Multi-Armed Bandits , 2010, AISTATS.

[442] Lise Getoor,et al. Learning in Logic , 2010, Encyclopedia of Machine Learning.

[443] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[444] Hariharan Narayanan,et al. Sample Complexity of Testing the Manifold Hypothesis , 2010, NIPS.

[445] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[446] A. Krizhevsky. Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[447] Nando de Freitas,et al. Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[448] Yann LeCun,et al. Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[449] Marc'Aurelio Ranzato,et al. Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[450] Chris Eliasmith,et al. Deep networks for robust visual recognition , 2010, ICML.

[451] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[452] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[453] Geoffrey E. Hinton,et al. Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[454] Geoffrey E. Hinton,et al. Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.

[455] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[456] Geoffrey E. Hinton,et al. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[457] Yann LeCun,et al. Learning Fast Approximations of Sparse Coding , 2010, ICML.

[458] Rocco A. Servedio,et al. Restricted Boltzmann Machines are Hard to Approximately Evaluate or Simulate , 2010, ICML.

[459] Peggy Seriès,et al. Hallucinations in Charles Bonnet Syndrome Induced by Homeostasis: a Deep Boltzmann Machine Model , 2010, NIPS.

[460] Jan Peters,et al. Policy Gradient Methods , 2010, Encyclopedia of Machine Learning.

[461] Ian J. Goodfellow,et al. Help me help you: Interfaces for personal robots , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[462] Joseph F. Murray,et al. Convolutional Networks Can Learn to Generate Affinity Graphs for Image Segmentation , 2010, Neural Computation.

[463] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[464] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[465] Nando de Freitas,et al. Asymptotic Efficiency of Deterministic Estimators for Discrete Energy-Based Models: Ratio Matching and Pseudolikelihood , 2011, UAI.

[466] Veselin Stoyanov,et al. Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[467] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.

[468] Brendan J. Frey,et al. Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context , 2011, Bioinform..

[469] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[470] Radford M. Neal. Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[471] Nicolas Pinto,et al. Beyond simple features: A large-scale feature search approach to unconstrained face recognition , 2011, Face and Gesture 2011.

[472] Pascal Vincent,et al. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[473] Tapani Raiko,et al. Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines , 2011, ICML.

[474] Jason Weston,et al. Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[475] Zhenghao Chen,et al. On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[476] Pascal Vincent,et al. Higher Order Contractive Auto-Encoder , 2011, ECML/PKDD.

[477] Yoshua Bengio,et al. On Tracking The Partition Function , 2011, NIPS.

[478] Nando de Freitas,et al. On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[479] Hugo Larochelle,et al. The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[480] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[481] Yann LeCun,et al. Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[482] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[483] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[484] Geoffrey E. Hinton,et al. Learning a better representation of speech soundwaves using restricted boltzmann machines , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[485] Yoshua Bengio,et al. Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[486] Mohamed Chtourou,et al. On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[487] Yoshua Bengio,et al. Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.

[488] Peggy Seriès,et al. Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability , 2011, NIPS.

[489] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[490] David J. Fleet,et al. Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[491] Kewei Tu,et al. On the Utility of Curricula in Unsupervised Learning of Probabilistic Grammars , 2011, IJCAI.

[492] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[493] Andrew Y. Ng,et al. The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[494] Kevin Leyton-Brown,et al. Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[495] Lukás Burget,et al. Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[496] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[497] Geoffrey E. Hinton,et al. Using very deep autoencoders for content-based image retrieval , 2011, ESANN.

[498] Yong Jae Lee,et al. Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[499] Pascal Vincent,et al. The Manifold Tangent Classifier , 2011, NIPS.

[500] Ruimin Shen,et al. Learning Class-relevant Features and Class-irrelevant Features via a Hybrid third-order RBM , 2011, AISTATS.

[501] Geoffrey E. Hinton,et al. Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.

[502] Nicolas Le Roux,et al. Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[503] Andrew Y. Ng,et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[504] David Cox,et al. Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook , 2011, CVPR 2011 WORKSHOPS.

[505] Yoshua Bengio,et al. Large-Scale Learning of Embeddings with Reconstruction Sampling , 2011, ICML.

[506] Jeffrey Pennington,et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[507] Yoshua Bengio,et al. Incorporating complex cells into neural networks for pattern classification , 2011 .

[508] John Langford,et al. Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.

[509] Berin Martini,et al. Large-Scale FPGA-based Convolutional Networks , 2011 .

[510] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[511] Tara N. Sainath,et al. Deep Belief Networks using discriminative features for phone recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[512] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[513] Nihat Ay,et al. Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines , 2010, Neural Computation.

[514] Lukás Burget,et al. Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[515] Jeffrey Pennington,et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[516] Pedro M. Domingos,et al. Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[517] Ronan Collobert,et al. Deep Learning for Efficient Discriminative Parsing , 2011, AISTATS.

[518] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.

[519] Bilge Mutlu,et al. How Do Humans Teach: On Curriculum Learning and Teaching Dimension , 2011, NIPS.

[520] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .

[521] Geoffrey E. Hinton,et al. Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[522] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[523] Jürgen Schmidhuber,et al. Self-Delimiting Neural Networks , 2012, ArXiv.

[524] Yoshua Bengio,et al. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[525] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[526] T. Ciodaro,et al. Online particle detection with Neural Networks based on topological calorimetry information , 2012 .

[527] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[528] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[529] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[530] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[531] Herbert Jaeger,et al. Long Short-Term Memory in Echo State Networks: Details of a Simulation Study , 2012 .

[532] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[533] Seungjin Choi,et al. Independent Component Analysis , 2009, Handbook of Natural Computing.

[534] Yann LeCun,et al. Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.

[535] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[536] Yee Whye Teh,et al. A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[537] Jürgen Schmidhuber,et al. Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[538] Yoshua Bengio,et al. Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.

[539] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[540] Trevor Darrell,et al. Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[541] Klaus-Robert Müller,et al. Deep Boltzmann Machines and the Centering Trick , 2012, Neural Networks: Tricks of the Trade.

[542] Misha Denil,et al. Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.

[543] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[544] Stefan J. Kiebel,et al. Re-visiting the echo state property , 2012, Neural Networks.

[545] E. Culurciello,et al. NeuFlow: Dataflow vision processing system-on-a-chip , 2012, 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS).

[546] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .

[547] Yann LeCun,et al. Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[548] Yoshua Bengio,et al. Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery , 2012, ArXiv.

[549] Yoshua Bengio,et al. A Generative Process for sampling Contractive Auto-Encoders , 2012, ICML 2012.

[550] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[551] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[552] Ivan Titov,et al. Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[553] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[554] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[555] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[556] Bernhard Schölkopf,et al. On causal and anticausal learning , 2012, ICML.

[557] Jason Weston,et al. Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing , 2012, AISTATS.

[558] Geoffrey E. Hinton,et al. Deep Mixtures of Factor Analysers , 2012, ICML.

[559] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[560] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[561] Tara N. Sainath,et al. Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[562] Deva Ramanan,et al. Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[563] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[564] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[565] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[566] Hugo Larochelle,et al. RNADE: The real-valued neural autoregressive density-estimator , 2013, NIPS.

[567] Geoffrey E. Hinton,et al. Modeling Natural Images Using Gated MRFs , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[568] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[569] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[570] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.

[571] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[572] Koray Kavukcuoglu,et al. Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[573] Nitish Srivastava,et al. Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[574] Yoshua Bengio,et al. Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.

[575] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[576] Nitish Srivastava,et al. Improving Neural Networks with Dropout , 2013 .

[577] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[578] Geoffrey E. Hinton,et al. On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[579] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[580] Yoshua Bengio,et al. Deep Learning of Representations: Looking Forward , 2013, SLSP.

[581] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[582] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[583] Ian J. Goodfellow,et al. Pylearn2: a machine learning research library , 2013, ArXiv.

[584] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[585] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[586] Diederik P. Kingma. Fast Gradient-Based Inference with Continuous Latent Variable Models in Auxiliary Form , 2013, ArXiv.

[587] Honglak Lee,et al. Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines , 2013, ICML.

[588] Yann LeCun,et al. Indoor Semantic Segmentation using depth information , 2013, ICLR.

[589] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.

[590] Yann LeCun,et al. Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[591] Meng Cai,et al. Deep maxout neural networks for speech recognition , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[592] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[593] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[594] Laurenz Wiskott,et al. How to Center Binary Deep Boltzmann Machines , 2013, 1311.1354.

[595] Léon Bottou,et al. From machine learning to machine reasoning , 2011, Machine Learning.

[596] Yoshua Bengio,et al. Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs , 2013, NIPS.

[597] Yoshua Bengio,et al. Better Mixing via Deep Representations , 2012, ICML.

[598] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.

[599] Sida I. Wang,et al. Dropout Training as Adaptive Regularization , 2013, NIPS.

[600] Srinivas C. Turaga,et al. Connectomic reconstruction of the inner plexiform layer in the mouse retina , 2013, Nature.

[601] Benjamin Schrauwen,et al. Training energy-based models for time-series imputation , 2013, J. Mach. Learn. Res..

[602] Sumit Basu,et al. Teaching Classification Boundaries to Humans , 2013, AAAI.

[603] Jason Weston,et al. A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[604] Christopher D. Manning,et al. Fast dropout training , 2013, ICML.

[605] Yoshua Bengio,et al. Scaling Up Spike-and-Slab Models for Unsupervised Feature Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[606] Benjamin Schrauwen,et al. Deep content-based music recommendation , 2013, NIPS.

[607] Yoshua Bengio,et al. Texture Modeling with Convolutional Spike-and-Slab RBMs and Deep Extensions , 2012, AISTATS.

[608] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .

[609] Oswin Krause,et al. Approximation properties of DBNs with binary hidden units and real-valued visible units , 2013, ICML.

[610] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[611] Razvan Pascanu,et al. Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.

[612] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[613] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[614] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[615] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[616] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.

[617] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[618] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[619] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[620] Yoshua Bengio,et al. The Spike-and-Slab RBM and Extensions to Discrete and Sparse Data Distributions , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[621] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[622] Ronan Collobert,et al. Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[623] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[624] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[625] Zhen Wang,et al. Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[626] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[627] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[628] Yoshua Bengio,et al. What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[629] Jason Weston,et al. Question Answering with Subgraph Embeddings , 2014, EMNLP.

[630] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[631] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[632] Franz Pernkopf,et al. General Stochastic Networks for Classification , 2014, NIPS.

[633] David Sussillo,et al. Random Walks: Training Very Deep Nonlinear Feed-Forward Networks with Smart Initialization , 2014, ArXiv.

[634] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[635] Yoshua Bengio,et al. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.

[636] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[637] James Martens,et al. On the Expressive Efficiency of Sum Product Networks , 2014, ArXiv.

[638] Jasper Snoek,et al. Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[639] Razvan Pascanu,et al. How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[640] Jürgen Schmidhuber,et al. A Clockwork RNN , 2014, ICML.

[641] Phil Blunsom,et al. Learning Bilingual Word Representations by Marginalizing Alignments , 2014, ACL.

[642] Parul Parashar,et al. Neural Networks in Machine Learning , 2014 .

[643] P. Baldi,et al. Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[644] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[645] Yoshua Bengio,et al. Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[646] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[647] Daan Wierstra,et al. Deep AutoRegressive Networks , 2013, ICML.

[648] Yoshua Bengio,et al. An empirical analysis of dropout in piecewise linear networks , 2013, ICLR.

[649] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[650] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.

[651] Hugo Larochelle,et al. A Deep and Tractable Density Estimator , 2013, ICML.

[652] Guido Montúfar,et al. Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units , 2013, Neural Computation.

[653] Daniel L. K. Yamins,et al. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[654] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[655] Tom Schaul,et al. Unit Tests for Stochastic Optimization , 2013, ICLR.

[656] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[657] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[658] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[659] Max Welling,et al. Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets , 2014, ICML.

[660] W. Karush. Minima of Functions of Several Variables with Inequalities as Side Conditions , 2014 .

[661] Matthias Bethge,et al. How close are we to understanding image-based saliency? , 2014, ArXiv.

[662] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[663] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[664] Yaroslav Bulatov,et al. Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[665] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[666] Razvan Pascanu,et al. On the number of inference regions of deep feed forward networks with piece-wise linear activations , 2013, ICLR.

[667] Tapani Raiko,et al. Iterative Neural Autoregressive Distribution Estimator NADE-k , 2014, NIPS.

[668] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[669] Christian Osendorfer,et al. Learning Stochastic Recurrent Networks , 2014, NIPS 2014.

[670] Brendan J. Frey,et al. Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[671] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[672] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[673] Surya Ganguli,et al. Analyzing noise in autoencoders and deep networks , 2014, ArXiv.

[674] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[675] Balázs Kégl,et al. The Higgs boson machine learning challenge , 2014, HEPML@NIPS.

[676] Zhen Wang,et al. Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.

[677] Dong Yu,et al. Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[678] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.

[679] Jian Zhou,et al. Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction , 2014, ICML.

[680] Roland Memisevic,et al. The Potential Energy of an Autoencoder , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[681] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[682] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[683] Ferenc Huszar,et al. How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.

[684] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[685] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[686] Nadav Cohen,et al. On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[687] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[688] Zoubin Ghahramani,et al. Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[689] Grzegorz Chrupala,et al. Learning language through pictures , 2015, ACL.

[690] Philip Bachman,et al. Variational Generative Stochastic Networks with Collaborative Shaping , 2015, ICML.

[691] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.

[692] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[693] Pierre-Luc Bacon. Conditional computation in neural networks using a decision-theoretic approach , 2015 .

[694] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[695] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.

[696] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[697] Zoubin Ghahramani,et al. Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[698] B. Frey,et al. The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[699] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[700] Vadlamani Ravi,et al. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[701] Robert P. Sheridan,et al. Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[702] Wojciech Zaremba,et al. An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[703] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[704] Misha Denil,et al. From Group to Individual Labels Using Deep Features , 2015, KDD.

[705] Sergey Levine,et al. Learning Visual Feature Spaces for Robotic Manipulation with Deep Spatial Autoencoders , 2015, ArXiv.

[706] Steve Renals,et al. A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition , 2015, INTERSPEECH.

[707] Jason Weston,et al. Memory Networks , 2014, ICLR.

[708] Xavier Bouthillier,et al. Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets , 2014, NIPS.

[709] Koray Kavukcuoglu,et al. Visual Attention , 2020, Computational Models for Cognitive Vision.

[710] Yoshua Bengio,et al. Reweighted Wake-Sleep , 2014, ICLR.

[711] Yoshua Bengio,et al. ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks , 2015, ArXiv.

[712] Pascal Vincent,et al. GSNs : Generative Stochastic Networks , 2015, ArXiv.

[713] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[714] Xiaodong He,et al. A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems , 2015, WWW.

[715] Zhiyuan Liu,et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[716] Bolei Zhou,et al. Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[717] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[718] Jason Morton,et al. When Does a Mixture of Products Contain a Product of Mixtures? , 2012, SIAM J. Discret. Math..

[719] Jason Weston,et al. Weakly Supervised Memory Networks , 2015, ArXiv.

[720] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[721] Hossein Mobahi,et al. A Theoretical Analysis of Optimization by Gaussian Continuation , 2015, AAAI.

[722] Ian J. Goodfellow,et al. On distinguishability criteria for estimating generative models , 2014, ICLR.

[723] Thomas Brox,et al. Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[724] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[725] Gabriel Kreiman,et al. Unsupervised Learning of Visual Structure using Predictive Generative Networks , 2015, ArXiv.

[726] Richard S. Zemel,et al. Generative Moment Matching Networks , 2015, ICML.

[727] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[728] Tapani Raiko,et al. Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[729] Yoshua Bengio,et al. Gated Feedback Recurrent Neural Networks , 2015, ICML.

[730] Tomas Mikolov,et al. Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[731] Shin Ishii,et al. Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[732] Ronan Collobert,et al. From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[733] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[734] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[735] Yoshua Bengio,et al. Low precision arithmetic for deep learning , 2014, ICLR.

[736] Yoshua Bengio,et al. NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[737] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[738] Jürgen Schmidhuber,et al. Highway Networks , 2015, ArXiv.

[739] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[740] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[741] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[742] Yoshua Bengio,et al. Early Inference in Energy-Based Models Approximates Back-Propagation , 2015, ArXiv.

[743] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[744] Phil Blunsom,et al. Learning to Transduce with Unbounded Memory , 2015, NIPS.

[745] Zhang Chun-xi. Restricted Boltzmann Machines , 2015 .

[746] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[747] Jonathan Tompson,et al. Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[748] Enrique Herrera-Viedma,et al. Sentiment analysis: A review and comparative analysis of web services , 2015, Inf. Sci..

[749] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[750] Wojciech Zaremba,et al. Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[751] Thomas S. Huang,et al. An Analysis of Unsupervised Pre-training in Light of Recent Advances , 2014, ICLR.

[752] Yoshua Bengio,et al. Training Bidirectional Helmholtz Machines , 2015 .

[753] Shengen Yan,et al. Deep Image: Scaling up Image Recognition , 2015, ArXiv.

[754] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[755] Zhuowen Tu,et al. Deeply-Supervised Nets , 2014, AISTATS.

[756] Joelle Pineau,et al. Conditional Computation in Neural Networks for faster models , 2015, ArXiv.

[757] Yoshua Bengio,et al. BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[758] Jitendra Malik,et al. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[759] Jeffrey Dean,et al. Large-Scale Deep Learning For Building Intelligent Computer Systems , 2016, WSDM.

[760] Yves Grandvalet,et al. Combining Two And Three-Way Embeddings Models for Link Prediction in Knowledge Bases , 2016, J. Artif. Intell. Res..

[761] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[762] Matthias Bethge,et al. A note on the evaluation of generative models , 2015, ICLR.

[763] Ruslan Salakhutdinov,et al. Importance Weighted Autoencoders , 2015, ICLR.

[764] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[765] Oriol Vinyals,et al. Multilingual Language Processing From Bytes , 2015, NAACL.

[766] Yoshua Bengio,et al. Knowledge Matters: Importance of Prior Information for Optimization , 2013, J. Mach. Learn. Res..

[767] Alex Graves,et al. Grid Long Short-Term Memory , 2015, ICLR.

[768] Tapani Raiko,et al. Stochastic gradient estimate variance in contrastive divergence and persistent contrastive divergence , 2016, ESANN.

[769] Phillipp Kaestner,et al. Linear And Nonlinear Programming , 2016 .

[770] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[771] Bozhkov Lachezar,et al. Echo State Network , 2017, Encyclopedia of Machine Learning and Data Mining.

[772] A. Hall,et al. Adaptive Switching Circuits , 2016 .

[773] F. Ramsey. Truth and Probability , 2016 .

[774] Jiri Matas,et al. All you need is a good init , 2015, ICLR.

[775] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[776] Claudia Biermann,et al. Mathematical Methods Of Statistics , 2016 .

[777] Vishal. A. Kharde,et al. Sentiment Analysis of Twitter Data : A Survey of Techniques , 2016, ArXiv.

[778] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[779] Xiaojin Zhu,et al. Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[780] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[781] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[782] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .