Deep Learning
暂无分享,去创建一个
Geoffrey E. Hinton | Yoshua Bengio | Aaron C. Courville | Aaron Courville | Ian J. Goodfellow | Ian Goodfellow | Yoshua Bengio | Yann LeCun | Geoffrey Hinton
[1] O. Perron. Zur Theorie der Matrices , 1907 .
[2] Student,et al. THE PROBABLE ERROR OF A MEAN , 1908 .
[3] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .
[4] Kenneth Levenberg. A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .
[5] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[6] Robert Price,et al. A useful theorem for nonlinear devices having Gaussian inputs , 1958, IRE Trans. Inf. Theory.
[7] D. Hubel,et al. Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.
[8] J. B. Rosen. The Gradient Projection Method for Nonlinear Programming. Part I. Linear Constraints , 1960 .
[9] Henry J. Kelley,et al. Gradient Theory of Optimal Flight Paths , 1960 .
[10] J. B. Rosen. The gradient projection method for nonlinear programming: Part II , 1961 .
[11] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.
[12] S. Dreyfus. The numerical solution of variational problems , 1962 .
[13] A. E. Bryson,et al. A Steepest-Ascent Method for Solving Optimum Programming Problems , 1962 .
[14] A. A. Mullin,et al. Principles of neurodynamics , 1962 .
[15] D. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .
[16] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[17] G. Bonnet. Transformations des signaux aléatoires a travers les systèmes non linéaires sans mémoire , 1964 .
[18] D. Hubel,et al. Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.
[19] marquis de L'Hospital. Analyse des infiniment petits, pour l'intelligence des lignes courbes , 1970 .
[20] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .
[21] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.
[22] Roger M. Needham,et al. Note on evaluation , 1973, Inf. Storage Retr..
[23] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[24] J. Besag. Statistical Analysis of Non-Lattice Data , 1975 .
[25] T. Zaslavsky. Facing Up to Arrangements: Face-Count Formulas for Partitions of Space by Hyperplanes , 1975 .
[26] D Marr,et al. Cooperative computation of stereo disparity. , 1976, Science.
[27] S. Linnainmaa. Taylor expansion of the accumulated rounding error , 1976 .
[28] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[29] George M. Siouris,et al. Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.
[30] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .
[31] Kunihiko Fukushima,et al. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..
[32] Paul J. Werbos,et al. Applications of advances in nonlinear sensitivity analysis , 1982 .
[33] G. Lakoff,et al. Metaphors We Live by , 1982 .
[34] Francis Crick,et al. The function of dream sleep , 1983, Nature.
[35] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[36] Geoffrey E. Hinton,et al. Massively Parallel Architectures for AI: NETL, Thistle, and Boltzmann Machines , 1983, AAAI.
[37] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[38] James R. Wilson. Variance Reduction Techniques for Digital Simulation , 1984 .
[39] D. Rubin. Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .
[40] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..
[41] Geoffrey E. Hinton,et al. Symbols Among the Neurons: Details of a Connectionist Inference Architecture , 1985, IJCAI.
[42] N. J. Cohen,et al. Higher-Order Boltzmann Machines , 1986 .
[43] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[44] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[45] Geoffrey E. Hinton,et al. The appeal of parallel distributed processing , 1986 .
[46] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .
[47] Yann LeCun,et al. Learning processes in an asymmetric threshold network , 1986 .
[48] Johan Håstad,et al. Almost optimal lower bounds for small depth circuits , 1986, STOC '86.
[49] L. Devroye. Non-Uniform Random Variate Generation , 1986 .
[50] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .
[51] Pavel Pudlák,et al. Threshold circuits of bounded depth , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[52] Geoffrey E. Hinton,et al. Learning Representations by Recirculation , 1987, NIPS.
[53] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..
[54] Y. T. Zhou,et al. Computation of optical flow using a neural network , 1988, IEEE 1988 International Conference on Neural Networks.
[55] Lalit R. Bahl,et al. Speech recognition with continuous-parameter hidden Markov models , 1987, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[56] Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.
[57] Yann LeCun,et al. Generalization and network design strategies , 1989 .
[58] P. Foldiak,et al. Adaptive network for optimal linear feature extraction , 1989, International 1989 Joint Conference on Neural Networks.
[59] Eduardo D. Sontag,et al. Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..
[60] L.D. Jackel,et al. Analog electronic neural network circuits , 1989, IEEE Circuits and Devices Magazine.
[61] Mohammed Ismail,et al. Analog VLSI Implementation of Neural Systems , 2011, The Kluwer International Series in Engineering and Computer Science.
[62] I. Guyon,et al. Handwritten digit recognition: applications of neural network chips and automatic learning , 1989, IEEE Communications Magazine.
[63] R. Solomonoff. A SYSTEM FOR INCREMENTAL LEARNING BASED ON ALGORITHMIC PROBABILITY , 1989 .
[64] Françoise Fogelman-Soulié,et al. Experiments with time delay networks and dynamic time warping for speaker independent isolated digits recognition , 1989, EUROSPEECH.
[65] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.
[66] J. Slawny,et al. Back propagation fails to separate where perceptrons succeed , 1989 .
[67] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.
[68] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[69] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[70] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[71] Hervé Bourlard,et al. Speech pattern discrimination and multilayer perceptrons , 1989 .
[72] Geoffrey E. Hinton. Connectionist Learning Procedures , 1989, Artif. Intell..
[73] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..
[74] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[75] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.
[76] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[77] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.
[78] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..
[79] Geoffrey E. Hinton,et al. Distributed Representations , 1986, The Philosophy of Artificial Intelligence.
[80] Kurt Hornik,et al. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.
[81] Jordan B. Pollack,et al. Recursive Distributed Representations , 1990, Artif. Intell..
[82] Geoffrey E. Hinton. Mapping Part-Whole Hierarchies into Connectionist Networks , 1990, Artif. Intell..
[83] S. Mor-Yosef,et al. Ranking the Risk Factors for Cesarean: Logistic Regression Analysis of a Nationwide Study , 1990, Obstetrics and gynecology.
[84] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.
[85] J. Stephen Judd,et al. Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.
[86] John S. Bridle,et al. Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..
[87] Ramanathan V. Guha,et al. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .
[88] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.
[89] Christian Jutten,et al. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..
[90] Risto Miikkulainen,et al. Natural Language Processing With Modular PDP Networks and Distributed Lexicon , 1991, Cogn. Sci..
[91] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[92] B. Wolf. The Machine That Changed the World , 1991 .
[93] Yoshua Bengio,et al. Neural Network - Gaussian Mixture Hybrid for Speech Recognition or Density Estimation , 1991, NIPS.
[94] Eduardo Sontag,et al. Turing computability with neural nets , 1991 .
[95] Frank Fallside,et al. A recurrent error propagation network speech recognition system , 1991 .
[96] Geoffrey E. Hinton,et al. Lesioning an attractor network: investigations of acquired dyslexia. , 1991, Psychological review.
[97] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[98] Lawrence D. Jackel,et al. An analog neural network processor with programmable topology , 1991 .
[99] J. L. Holt,et al. Back propagation simulations using limited precision calculations , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[100] D. J. Felleman,et al. Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.
[101] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[102] Yoshua Bengio,et al. Artificial neural networks and their application to sequence recognition , 1991 .
[103] Michael C. Mozer,et al. Induction of Multiscale Temporal Structure , 1991, NIPS.
[104] Jocelyn Sietsma,et al. Creating artificial neural networks that generalize , 1991, Neural Networks.
[105] Yann LeCun,et al. Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.
[106] Dinh Tuan Pham,et al. Separation of a mixture of independent sources through a maximum likelihood approach , 1992 .
[107] Saul B. Gelfand,et al. Classification trees with neural network feature extraction , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[108] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.
[109] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[110] Geoffrey E. Hinton,et al. Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.
[111] Alberto Tesi,et al. On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..
[112] E. Capaldi,et al. The organization of behavior. , 1992, Journal of applied behavior analysis.
[113] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.
[114] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.
[115] Bruce Christianson,et al. Automatic Hessians by reverse accumulation , 1992 .
[116] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[117] Yoshua Bengio,et al. Phonetically motivated acoustic parameters for continuous speech recognition using artificial neural networks , 1991, Speech Commun..
[118] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.
[119] Patrice Marcotte,et al. Novel approaches to the discrimination problem , 1992, ZOR Methods Model. Oper. Res..
[120] C. R. Rao,et al. Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .
[121] Geoffrey E. Hinton,et al. Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.
[122] Saul B. Gelfand,et al. Classification trees with neural network feature extraction , 1992, IEEE Trans. Neural Networks.
[123] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[124] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.
[125] Martin Fodslette Møller,et al. A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.
[126] Geoffrey E. Hinton,et al. Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.
[127] Yoshua Bengio,et al. The problem of learning long-term dependencies in recurrent networks , 1993, IEEE International Conference on Neural Networks.
[128] Jenq-Neng Hwang,et al. Finite Precision Error Analysis of Neural Network Hardware Implementations , 1993, IEEE Trans. Computers.
[129] D. V. van Essen,et al. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[130] Patrice Y. Simard,et al. Backpropagation without Multiplication , 1993, NIPS.
[131] Kenji Doya,et al. Bifurcations of Recurrent Neural Networks in Gradient Descent Learning , 1993 .
[132] Hermann Ney,et al. Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.
[133] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.
[134] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.
[135] Wolfgang Maass,et al. Bounds for the computational power and learning complexity of analog neural nets , 1993, SIAM J. Comput..
[136] R. Vaillant,et al. Original approach for the localisation of objects in images , 1994 .
[137] Pierre L'Ecuyer,et al. Efficiency improvement and variance reduction , 1994, Proceedings of Winter Simulation Conference.
[138] Terence D. Sanger,et al. Neural network learning control of robot manipulators using gradually increasing task difficulty , 1994, IEEE Trans. Robotics Autom..
[139] Clark S. Lindsey,et al. Review of hardware neural networks: A User's perspective , 1994 .
[140] Eduardo Sontag,et al. A Comparison of the Computational Power of Sigmoid and Boolean Threshold Circuits , 1994 .
[141] Sebastian Thrun,et al. Learning to Play the Game of Chess , 1994, NIPS.
[142] Jürgen Schmidhuber,et al. Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.
[143] G. Kane. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .
[144] Schuster,et al. Separation of a mixture of independent signals using time delayed correlations. , 1994, Physical review letters.
[145] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..
[146] Geoffrey E. Hinton,et al. Recognizing Handwritten Digits Using Mixtures of Linear Models , 1994, NIPS.
[147] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[148] R. L. Haggard,et al. A fixed point implementation of the backpropagation learning algorithm , 1994, Proceedings of SOUTHEASTCON '94.
[149] S. Srihari. Mixture Density Networks , 1994 .
[150] Peter Tiňo,et al. Learning long-term dependencies is not as difficult with NARX recurrent neural networks , 1995 .
[151] Christopher M. Bishop,et al. Regularization and complexity control in feed-forward networks , 1995 .
[152] L. Ljung,et al. Overtraining, regularization and searching for a minimum, with application to neural networks , 1995 .
[153] Carl E. Rasmussen,et al. In Advances in Neural Information Processing Systems , 2011 .
[154] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[155] J. E. Jackson,et al. Statistical Factor Analysis and Related Methods: Theory and Applications , 1995 .
[156] J. J. Moré,et al. Global continuation for distance geometry problems , 1995 .
[157] Geoffrey E. Hinton,et al. The Helmholtz Machine , 1995, Neural Computation.
[158] Geoffrey E. Hinton,et al. The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.
[159] Eduard Aved’yan,et al. Multilayer Neural Networks , 1995 .
[160] Jonathan Baxter,et al. Learning internal representations , 1995, COLT '95.
[161] Hava T. Siegelmann,et al. On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..
[162] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.
[163] S. Mase,et al. Consistency of the Maximum Pseudo-Likelihood Estimator of Continuous State Space Gibbsian Processes , 1995 .
[164] H T Siegelmann,et al. Dating and Context of Three Middle Stone Age Sites with Bone Points in the Upper Semliki Valley, Zaire , 2007 .
[165] Christopher M. Bishop,et al. Current address: Microsoft Research, , 2022 .
[166] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .
[167] Michael I. Jordan,et al. Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.
[168] Yochai Konig,et al. REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities - Application to Transition-Based Connectionist Speech Recognition , 1995, NIPS.
[169] C. Lee Giles,et al. An analysis of noise in recurrent neural networks: convergence and generalization , 1996, IEEE Trans. Neural Networks.
[170] Geoffrey E. Hinton,et al. The EM algorithm for mixtures of factor analyzers , 1996 .
[171] Michael I. Jordan,et al. Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..
[172] Yann LeCun,et al. Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.
[173] Heikki Hyotyniemi,et al. Turing Machines Are Recurrent Neural Networks , 1996 .
[174] Radford M. Neal. Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..
[175] San Cristóbal Mateo,et al. The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .
[176] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.
[177] Geoffrey E. Hinton,et al. Varieties of Helmholtz Machine , 1996, Neural Networks.
[178] Jürgen Schmidhuber,et al. Sequential neural text compression , 1996, IEEE Trans. Neural Networks.
[179] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.
[180] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[181] Yoav Freund,et al. Game theory, on-line prediction and boosting , 1996, COLT '96.
[182] Brian Kingsbury,et al. Spert-II: A Vector Microprocessor System , 1996, Computer.
[183] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[184] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[185] B. Sparkes. The Red and the Black: Studies in Greek Pottery , 1996 .
[186] Yoshua Bengio,et al. Training Methods for Adaptive Boosting of Neural Networks , 1997, NIPS.
[187] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[188] Geoffrey E. Hinton,et al. Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.
[189] Geoffrey E. Hinton,et al. Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.
[190] David H. Wolpert,et al. No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..
[191] Ah Chung Tsoi,et al. Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.
[192] Alessandro Sperduti,et al. On the Efficient Classification of Data Structures by Neural Networks , 1997, IJCAI.
[193] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[194] C. Jarzynski. Nonequilibrium Equality for Free Energy Differences , 1996, cond-mat/9610209.
[195] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[196] George Trapp,et al. Using Complex Variables to Estimate Derivatives of Real Functions , 1998, SIAM Rev..
[197] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.
[198] Eduardo Sontag. VC dimension of neural networks , 1998 .
[199] Brendan J. Frey,et al. Graphical Models for Machine Learning and Digital Communication , 1998 .
[200] D. Simons,et al. Failure to detect changes to people during a real-world interaction , 1998 .
[201] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[202] Alessandro Sperduti,et al. A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.
[203] Georges Raepsaet. David W. Tandy & Walter C. Neale, Hesiod' s Works and Days. A Translation and Commentary for the Social Sciences , 1998 .
[204] Alexander J. Smola,et al. Learning with kernels , 1998 .
[205] Aapo Hyvärinen,et al. Emergence of Topography and Complex Cell Properties from Natural Images using Extensions of ICA , 1999, NIPS.
[206] Aapo Hyvärinen,et al. Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.
[207] Samy Bengio,et al. Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks , 1999, NIPS.
[208] Aapo Hyvärinen,et al. Survey on Independent Component Analysis , 1999 .
[209] Giovanni Soda,et al. Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..
[210] B. Schölkopf,et al. Advances in kernel methods: support vector learning , 1999 .
[211] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[212] L. Younes. On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .
[213] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.
[214] Terrence J. Sejnowski,et al. Unsupervised Learning , 2018, Encyclopedia of GIS.
[215] Mike Schuster,et al. On supervised learning from sequential data with applications for speech regognition , 1999 .
[216] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.
[217] Samy Bengio,et al. Taking on the curse of dimensionality in joint distributions using neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..
[218] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.
[219] Shaogang Gong,et al. Dynamic Vision - From Images to Face Recognition , 2000 .
[220] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[221] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[222] Juha Karhunen,et al. Nonlinear Independent Component Analysis Using Ensemble Learning: Experiments and Discussion , 2000 .
[223] Geoffrey E. Hinton,et al. Extracting distributed representations of concepts and relations from positive and negative propositions , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.
[224] M. Sur,et al. Visual behaviour mediated by retinal projections directed to the auditory pathway , 2000, Nature.
[225] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.
[226] Yoshua Bengio,et al. Série Scientifique Scientific Series Incorporating Second-order Functional Knowledge for Better Option Pricing Incorporating Second-order Functional Knowledge for Better Option Pricing , 2022 .
[227] Yoshua Bengio,et al. Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.
[228] Sven Behnke,et al. Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid , 2001, Int. J. Comput. Intell. Appl..
[229] Radford M. Neal. Annealed importance sampling , 1998, Stat. Comput..
[230] Joshua Goodman,et al. Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[231] Paul A. Viola,et al. Robust Real-time Object Detection , 2001 .
[232] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[233] Geoffrey E. Hinton,et al. Global Coordination of Local Linear Models , 2001, NIPS.
[234] DeLiang Wang,et al. Unsupervised Learning: Foundations of Neural Computation , 2001, AI Mag..
[235] Aapo Hyvärinen,et al. Topographic Independent Component Analysis , 2001, Neural Computation.
[236] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.
[237] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[238] Yukito Iba. EXTENDED ENSEMBLE MONTE CARLO , 2001 .
[239] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[240] Yee Whye Teh,et al. A New View of ICA , 2001 .
[241] Refractor. Metamorphoses , 1868, The Lancet.
[242] Stan Lipovetsky,et al. Latent Variable Models and Factor Analysis , 2001, Technometrics.
[243] Geoffrey E. Hinton,et al. Self Supervised Boosting , 2002, NIPS.
[244] Geoffrey E. Hinton,et al. Learning Sparse Topographic Representations with Products of Student-t Distributions , 2002, NIPS.
[245] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.
[246] Samy Bengio,et al. A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.
[247] K. I. WilliamsDivision,et al. Products of Gaussians and Probabilistic Minor Component Analysis , 2002, Neural Computation.
[248] Herbert Jaeger,et al. Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.
[249] Pascal Vincent,et al. Manifold Parzen Windows , 2002, NIPS.
[250] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.
[251] Bernhard Schölkopf,et al. Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.
[252] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[253] F. Huang,et al. Generalized Pseudo-Likelihood Estimates for Markov Random Fields on Lattice , 2002 .
[254] Matthew Brand,et al. Charting a Manifold , 2002, NIPS.
[255] Feng-Hsiung Hsu,et al. Behind Deep Blue: Building the Computer that Defeated the World Chess Champion , 2002 .
[256] Henry Markram,et al. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.
[257] Jean-Luc Gauvain,et al. Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[258] Yoshua Bengio,et al. No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..
[259] James Henderson. Inducing History Representations for Broad Coverage Statistical Parsing , 2003, HLT-NAACL.
[260] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..
[261] Tony R. Martinez,et al. The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.
[262] Blockin Blockin,et al. Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003 .
[263] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.
[264] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.
[265] D. Donoho,et al. Hessian Eigenmaps : new locally linear embedding techniques for high-dimensional data , 2003 .
[266] Yee Whye Teh,et al. Energy-Based Models for Sparse Overcomplete Representations , 2003, J. Mach. Learn. Res..
[267] William S. Rayens,et al. Independent Component Analysis: Principles and Practice , 2003, Technometrics.
[268] Valeriu Beiu,et al. VLSI implementations of threshold logic-a comprehensive survey , 2003, IEEE Trans. Neural Networks.
[269] Kunihiko Fukushima,et al. Cognitron: A self-organizing multilayered neural network , 1975, Biological Cybernetics.
[270] Harald Haas,et al. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.
[271] James Henderson,et al. Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.
[272] Geoffrey E. Hinton,et al. Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.
[273] Geoffrey E. Hinton,et al. Neighbourhood Components Analysis , 2004, NIPS.
[274] G. Peterson. A day of great illumination: B. F. Skinner's discovery of shaping. , 2004, Journal of the experimental analysis of behavior.
[275] Dario L. Ringach,et al. Reverse correlation in neurophysiology , 2004, Cogn. Sci..
[276] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[277] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .
[278] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[279] Christophe Garcia,et al. Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[280] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.
[281] Yann LeCun,et al. Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..
[282] Kilian Q. Weinberger,et al. Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..
[283] James L. McClelland,et al. Semantic Cognition: A Parallel Distributed Processing Approach , 2004 .
[284] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[285] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[286] Yoshua Bengio,et al. Non-Local Manifold Tangent Learning , 2004, NIPS.
[287] Kilian Q. Weinberger,et al. Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, CVPR.
[288] H. Bourlard,et al. Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.
[289] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[290] M. Tribus,et al. Probability theory: the logic of science , 2003 .
[291] Lawrence Cayton,et al. Algorithms for manifold learning , 2005 .
[292] H. Inayoshi,et al. Improved Generalization by Adding both Auto-Association and Hidden-Layer-Noise to Neural-Network-Based-Classifiers , 2005, 2005 IEEE Workshop on Machine Learning for Signal Processing.
[293] David J. Field,et al. How Close Are We to Understanding V1? , 2005, Neural Computation.
[294] Radford M. Neal. Estimating Ratios of Normalizing Constants Using Linked Importance Sampling , 2005, math/0511216.
[295] Laurenz Wiskott,et al. Slow feature analysis yields a rich repertoire of complex cell properties. , 2005, Journal of vision.
[296] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[297] Paola Velardi,et al. Structural semantic interconnections: a knowledge-based approach to word sense disambiguation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[298] Johan Håstad,et al. On the power of small-depth threshold circuits , 1991, computational complexity.
[299] C. Koch,et al. Invariant visual representation by single neurons in the human brain , 2005, Nature.
[300] Eero P. Simoncelli,et al. Spatiotemporal Elements of Macaque V1 Receptive Fields , 2005, Neuron.
[301] Nicolas Le Roux,et al. Convex Neural Networks , 2005, NIPS.
[302] Adi Shraibman,et al. Rank, Trace-Norm and Max-Norm , 2005, COLT.
[303] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[304] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..
[305] Geoffrey E. Hinton. What kind of graphical model is the brain? , 2005, IJCAI.
[306] Thomas P. Minka,et al. Divergence measures and message passing , 2005 .
[307] Pascal Vincent,et al. Non-Local Manifold Parzen Windows , 2005, NIPS.
[308] Nicolas Le Roux,et al. The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.
[309] Yann LeCun,et al. Toward automatic phenotyping of developing embryos from videos , 2005, IEEE Transactions on Image Processing.
[310] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.
[311] Miguel Á. Carreira-Perpiñán,et al. On Contrastive Divergence Learning , 2005, AISTATS.
[312] Patrice Y. Simard,et al. Using GPUs for machine learning algorithms , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).
[313] Marta R. Costa-jussà,et al. Continuous space language models for the IWSLT 2006 task , 2006, IWSLT.
[314] Patrice Y. Simard,et al. High Performance Convolutional Neural Networks for Document Processing , 2006 .
[315] Rich Caruana,et al. Model compression , 2006, KDD '06.
[316] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.
[317] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[318] Geoffrey E. Hinton,et al. Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.
[319] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .
[320] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[321] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[322] Marc'Aurelio Ranzato,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.
[323] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[324] Kaare Brandt Petersen,et al. The Matrix Cookbook , 2006 .
[325] Holger Schwenk,et al. Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.
[326] Rajat Raina,et al. Efficient sparse coding algorithms , 2006, NIPS.
[327] Tom Minka,et al. Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[328] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..
[329] Max Welling Donald,et al. Products of Experts , 2007 .
[330] Geoffrey E. Hinton. Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.
[331] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.
[332] Aapo Hyvärinen,et al. Connections Between Score Matching, Contrastive Divergence, and Pseudolikelihood for Continuous-Valued Variables , 2007, IEEE Transactions on Neural Networks.
[333] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.
[334] Geoffrey E. Hinton,et al. Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[335] James Bennett,et al. The Netflix Prize , 2007 .
[336] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[337] Marc'Aurelio Ranzato,et al. Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.
[338] Geoffrey E. Hinton,et al. To recognize shapes, first learn to generate images. , 2007, Progress in brain research.
[339] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[340] Jürgen Schmidhuber,et al. Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks , 2007, NIPS.
[341] Yann LeCun,et al. Online Learning for Offroad Robots: Spatial Label Propagation to Learn Long-Range Traversability , 2007, Robotics: Science and Systems.
[342] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .
[343] Geoffrey E. Hinton,et al. Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.
[344] Laurenz Wiskott,et al. Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells , 2007, PLoS Comput. Biol..
[345] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[346] Herbert Jaeger,et al. Optimization and applications of echo state networks with leaky- integrator neurons , 2007, Neural Networks.
[347] Geoffrey E. Hinton,et al. Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.
[348] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.
[349] Shawki Areibi,et al. The Impact of Arithmetic Representation on Implementing MLP-BP on FPGAs: A Study , 2007, IEEE Transactions on Neural Networks.
[350] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[351] Aapo Hyvärinen,et al. Some extensions of score matching , 2007, Comput. Stat. Data Anal..
[352] Ruslan Salakhutdinov,et al. Probabilistic Matrix Factorization , 2007, NIPS.
[353] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[354] Geoffrey E. Hinton,et al. Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.
[355] Joseph F. Murray,et al. Supervised Learning of Image Restoration with Convolutional Networks , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[356] Geoffrey E. Hinton,et al. The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.
[357] Ruslan Salakhutdinov,et al. On the quantitative analysis of deep belief networks , 2008, ICML '08.
[358] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[359] Jim Hefferon,et al. Linear Algebra , 2012 .
[360] Geoffrey E. Hinton,et al. Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.
[361] Nicolas Le Roux,et al. Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.
[362] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[363] Sunita Sarawagi. Learning with Graphical Models , 2008 .
[364] Nicolas Pinto,et al. Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..
[365] Yoshua Bengio,et al. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.
[366] Tijmen Tieleman,et al. Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.
[367] D. Lizotte. Practical bayesian optimization , 2008 .
[368] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.
[369] Antonio Torralba,et al. Spectral Hashing , 2008, NIPS.
[370] Yoshua Bengio,et al. Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.
[371] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.
[372] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.
[373] T. Munich,et al. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.
[374] David M. Bradley,et al. Differentiable Sparse Coding , 2008, NIPS.
[375] Niko Wilbert,et al. Invariant Object Recognition with Slow Feature Analysis , 2008, ICANN.
[376] Herbert Jaeger,et al. Discovering multiscale dynamical features with hierarchical Echo State Networks , 2008 .
[377] Uwe Naumann,et al. Optimal Jacobian accumulation is NP-complete , 2007, Math. Program..
[378] Yoshua Bengio,et al. Zero-data Learning of New Tasks , 2008, AAAI.
[379] Geoffrey E. Hinton,et al. Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.
[380] Yoshua Bengio,et al. Slow, Decorrelated Features for Pretraining Complex Cell-like Networks , 2009, NIPS.
[381] Yoshua Bengio,et al. Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..
[382] H. Sebastian Seung,et al. Maximin affinity learning of image segmentation , 2009, NIPS.
[383] Kunle Olukotun,et al. A highly scalable Restricted Boltzmann Machine FPGA implementation , 2009, 2009 International Conference on Field Programmable Logic and Applications.
[384] Geoffrey E. Hinton,et al. Zero-shot Learning with Semantic Output Codes , 2009, NIPS.
[385] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[386] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.
[387] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[388] Yann LeCun,et al. Learning long‐range vision for autonomous off‐road driving , 2009, J. Field Robotics.
[389] Quoc V. Le,et al. Measuring Invariances in Deep Networks , 2009, NIPS.
[390] Geoffrey E. Hinton,et al. Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.
[391] Aapo Hyvärinen,et al. Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.
[392] J. Schmidhuber,et al. A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[393] R. Fergus,et al. Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[394] Geoffrey E. Hinton,et al. 3D Object Recognition with Deep Belief Nets , 2009, NIPS.
[395] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[396] Siwei Lyu,et al. Interpretation and Generalization of Score Matching , 2009, UAI.
[397] Herbert Jaeger,et al. Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..
[398] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.
[399] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.
[400] M. Del Giudice,et al. Programmed to learn? The ontogeny of mirror neurons. , 2009, Developmental science.
[401] Yehuda Koren,et al. The BellKor Solution to the Netflix Grand Prize , 2009 .
[402] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[403] Manfred Opper,et al. The Variational Gaussian Approximation Revisited , 2009, Neural Computation.
[404] Yoshua Bengio,et al. Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.
[405] Ruslan Salakhutdinov,et al. Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.
[406] Pascal Vincent,et al. The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.
[407] Geoffrey E. Hinton,et al. Semantic hashing , 2009, Int. J. Approx. Reason..
[408] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .
[409] Kai A. Krueger,et al. Flexible shaping: How learning in small steps helps , 2009, Cognition.
[410] Hossein Mobahi,et al. Deep learning from temporal coherence in video , 2009, ICML '09.
[411] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.
[412] Geoffrey E. Hinton,et al. Deep Belief Networks for phone recognition , 2009 .
[413] Hugo Larochelle,et al. Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.
[414] Quoc V. Le,et al. Tiled convolutional neural networks , 2010, NIPS.
[415] Geoffrey E. Hinton,et al. Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.
[416] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.
[417] Dong Yu,et al. Sequential Labeling Using Deep-Structured Conditional Random Fields , 2010, IEEE Journal of Selected Topics in Signal Processing.
[418] Geoffrey E. Hinton,et al. Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.
[419] Yann LeCun,et al. Emergence of Complex-Like Cells in a Temporal Product Network with Local Receptive Fields , 2010, ArXiv.
[420] Indranil Saha,et al. journal homepage: www.elsevier.com/locate/neucom , 2022 .
[421] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[422] Valentin I. Spitkovsky,et al. From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing , 2010, NAACL.
[423] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[424] Nicolas Le Roux,et al. Deep Belief Networks Are Compact Universal Approximators , 2010, Neural Computation.
[425] Nicolas Le Roux,et al. The Learning Workshop Snowbird , 2010 .
[426] Ilya Sutskever,et al. On the Convergence Properties of Contrastive Divergence , 2010, AISTATS.
[427] Bo Chen,et al. Deep Learning of Invariant Spatio-Temporal Features from Video , 2010 .
[428] Julian Eggert,et al. Binary Sparse Coding , 2010, LVA/ICA.
[429] Yann LeCun,et al. Regularized estimation of image statistics by Score Matching , 2010, NIPS.
[430] Tapani Raiko,et al. Parallel tempering is efficient for learning restricted Boltzmann machines , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).
[431] Geoffrey E. Hinton,et al. Generating more realistic images using gated MRF's , 2010, NIPS.
[432] Pascal Vincent,et al. Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.
[433] Geoffrey E. Hinton,et al. Learning to Detect Roads in High-Resolution Aerial Images , 2010, ECCV.
[434] Y-Lan Boureau,et al. Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.
[435] Geoffrey E. Hinton,et al. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.
[436] Fei-Fei Li,et al. What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.
[437] Daphne Koller,et al. Self-Paced Learning for Latent Variable Models , 2010, NIPS.
[438] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[439] J. G. Garson,et al. The metric system of identification of criminals, as used in Great Britain and Ireland. , 2010 .
[440] Jason Weston,et al. Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.
[441] Martin Pál,et al. Contextual Multi-Armed Bandits , 2010, AISTATS.
[442] Lise Getoor,et al. Learning in Logic , 2010, Encyclopedia of Machine Learning.
[443] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[444] Hariharan Narayanan,et al. Sample Complexity of Testing the Manifold Hypothesis , 2010, NIPS.
[445] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.
[446] A. Krizhevsky. Convolutional Deep Belief Networks on CIFAR-10 , 2010 .
[447] Nando de Freitas,et al. Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.
[448] Yann LeCun,et al. Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.
[449] Marc'Aurelio Ranzato,et al. Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.
[450] Chris Eliasmith,et al. Deep networks for robust visual recognition , 2010, ICML.
[451] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.
[452] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..
[453] Geoffrey E. Hinton,et al. Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[454] Geoffrey E. Hinton,et al. Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.
[455] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.
[456] Geoffrey E. Hinton,et al. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.
[457] Yann LeCun,et al. Learning Fast Approximations of Sparse Coding , 2010, ICML.
[458] Rocco A. Servedio,et al. Restricted Boltzmann Machines are Hard to Approximately Evaluate or Simulate , 2010, ICML.
[459] Peggy Seriès,et al. Hallucinations in Charles Bonnet Syndrome Induced by Homeostasis: a Deep Boltzmann Machine Model , 2010, NIPS.
[460] Jan Peters,et al. Policy Gradient Methods , 2010, Encyclopedia of Machine Learning.
[461] Ian J. Goodfellow,et al. Help me help you: Interfaces for personal robots , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[462] Joseph F. Murray,et al. Convolutional Networks Can Learn to Generate Affinity Graphs for Image Segmentation , 2010, Neural Computation.
[463] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[464] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.
[465] Nando de Freitas,et al. Asymptotic Efficiency of Deterministic Estimators for Discrete Energy-Based Models: Ratio Matching and Pseudolikelihood , 2011, UAI.
[466] Veselin Stoyanov,et al. Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.
[467] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.
[468] Brendan J. Frey,et al. Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context , 2011, Bioinform..
[469] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.
[470] Radford M. Neal. Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .
[471] Nicolas Pinto,et al. Beyond simple features: A large-scale feature search approach to unconstrained face recognition , 2011, Face and Gesture 2011.
[472] Pascal Vincent,et al. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.
[473] Tapani Raiko,et al. Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines , 2011, ICML.
[474] Jason Weston,et al. Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.
[475] Zhenghao Chen,et al. On Random Weights and Unsupervised Feature Learning , 2011, ICML.
[476] Pascal Vincent,et al. Higher Order Contractive Auto-Encoder , 2011, ECML/PKDD.
[477] Yoshua Bengio,et al. On Tracking The Partition Function , 2011, NIPS.
[478] Nando de Freitas,et al. On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.
[479] Hugo Larochelle,et al. The Neural Autoregressive Distribution Estimator , 2011, AISTATS.
[480] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[481] Yann LeCun,et al. Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.
[482] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[483] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..
[484] Geoffrey E. Hinton,et al. Learning a better representation of speech soundwaves using restricted boltzmann machines , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[485] Yoshua Bengio,et al. Shallow vs. Deep Sum-Product Networks , 2011, NIPS.
[486] Mohamed Chtourou,et al. On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.
[487] Yoshua Bengio,et al. Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.
[488] Peggy Seriès,et al. Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability , 2011, NIPS.
[489] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[490] David J. Fleet,et al. Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.
[491] Kewei Tu,et al. On the Utility of Curricula in Unsupervised Learning of Probabilistic Grammars , 2011, IJCAI.
[492] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[493] Andrew Y. Ng,et al. The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.
[494] Kevin Leyton-Brown,et al. Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.
[495] Lukás Burget,et al. Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.
[496] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.
[497] Geoffrey E. Hinton,et al. Using very deep autoencoders for content-based image retrieval , 2011, ESANN.
[498] Yong Jae Lee,et al. Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.
[499] Pascal Vincent,et al. The Manifold Tangent Classifier , 2011, NIPS.
[500] Ruimin Shen,et al. Learning Class-relevant Features and Class-irrelevant Features via a Hybrid third-order RBM , 2011, AISTATS.
[501] Geoffrey E. Hinton,et al. Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.
[502] Nicolas Le Roux,et al. Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.
[503] Andrew Y. Ng,et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.
[504] David Cox,et al. Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook , 2011, CVPR 2011 WORKSHOPS.
[505] Yoshua Bengio,et al. Large-Scale Learning of Embeddings with Reconstruction Sampling , 2011, ICML.
[506] Jeffrey Pennington,et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.
[507] Yoshua Bengio,et al. Incorporating complex cells into neural networks for pattern classification , 2011 .
[508] John Langford,et al. Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.
[509] Berin Martini,et al. Large-Scale FPGA-based Convolutional Networks , 2011 .
[510] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[511] Tara N. Sainath,et al. Deep Belief Networks using discriminative features for phone recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[512] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.
[513] Nihat Ay,et al. Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines , 2010, Neural Computation.
[514] Lukás Burget,et al. Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[515] Jeffrey Pennington,et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.
[516] Pedro M. Domingos,et al. Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).
[517] Ronan Collobert,et al. Deep Learning for Efficient Discriminative Parsing , 2011, AISTATS.
[518] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.
[519] Bilge Mutlu,et al. How Do Humans Teach: On Curriculum Learning and Teaching Dimension , 2011, NIPS.
[520] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .
[521] Geoffrey E. Hinton,et al. Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[522] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[523] Jürgen Schmidhuber,et al. Self-Delimiting Neural Networks , 2012, ArXiv.
[524] Yoshua Bengio,et al. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.
[525] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..
[526] T. Ciodaro,et al. Online particle detection with Neural Networks based on topological calorimetry information , 2012 .
[527] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[528] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[529] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[530] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[531] Herbert Jaeger,et al. Long Short-Term Memory in Echo State Networks: Details of a Simulation Study , 2012 .
[532] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.
[533] Seungjin Choi,et al. Independent Component Analysis , 2009, Handbook of Natural Computing.
[534] Yann LeCun,et al. Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.
[535] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..
[536] Yee Whye Teh,et al. A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.
[537] Jürgen Schmidhuber,et al. Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.
[538] Yoshua Bengio,et al. Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.
[539] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[540] Trevor Darrell,et al. Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[541] Klaus-Robert Müller,et al. Deep Boltzmann Machines and the Centering Trick , 2012, Neural Networks: Tricks of the Trade.
[542] Misha Denil,et al. Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.
[543] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[544] Stefan J. Kiebel,et al. Re-visiting the echo state property , 2012, Neural Networks.
[545] E. Culurciello,et al. NeuFlow: Dataflow vision processing system-on-a-chip , 2012, 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS).
[546] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .
[547] Yann LeCun,et al. Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).
[548] Yoshua Bengio,et al. Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery , 2012, ArXiv.
[549] Yoshua Bengio,et al. A Generative Process for sampling Contractive Auto-Encoders , 2012, ICML 2012.
[550] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[551] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[552] Ivan Titov,et al. Inducing Crosslingual Distributed Representations of Words , 2012, COLING.
[553] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[554] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[555] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.
[556] Bernhard Schölkopf,et al. On causal and anticausal learning , 2012, ICML.
[557] Jason Weston,et al. Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing , 2012, AISTATS.
[558] Geoffrey E. Hinton,et al. Deep Mixtures of Factor Analysers , 2012, ICML.
[559] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.
[560] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[561] Tara N. Sainath,et al. Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[562] Deva Ramanan,et al. Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[563] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[564] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[565] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.
[566] Hugo Larochelle,et al. RNADE: The real-valued neural autoregressive density-estimator , 2013, NIPS.
[567] Geoffrey E. Hinton,et al. Modeling Natural Images Using Gated MRFs , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[568] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[569] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[570] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.
[571] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[572] Koray Kavukcuoglu,et al. Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.
[573] Nitish Srivastava,et al. Modeling Documents with Deep Boltzmann Machines , 2013, UAI.
[574] Yoshua Bengio,et al. Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.
[575] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[576] Nitish Srivastava,et al. Improving Neural Networks with Dropout , 2013 .
[577] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[578] Geoffrey E. Hinton,et al. On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[579] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[580] Yoshua Bengio,et al. Deep Learning of Representations: Looking Forward , 2013, SLSP.
[581] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[582] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.
[583] Ian J. Goodfellow,et al. Pylearn2: a machine learning research library , 2013, ArXiv.
[584] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[585] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[586] Diederik P. Kingma. Fast Gradient-Based Inference with Continuous Latent Variable Models in Auxiliary Form , 2013, ArXiv.
[587] Honglak Lee,et al. Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines , 2013, ICML.
[588] Yann LeCun,et al. Indoor Semantic Segmentation using depth information , 2013, ICLR.
[589] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.
[590] Yann LeCun,et al. Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[591] Meng Cai,et al. Deep maxout neural networks for speech recognition , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[592] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[593] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[594] Laurenz Wiskott,et al. How to Center Binary Deep Boltzmann Machines , 2013, 1311.1354.
[595] Léon Bottou,et al. From machine learning to machine reasoning , 2011, Machine Learning.
[596] Yoshua Bengio,et al. Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs , 2013, NIPS.
[597] Yoshua Bengio,et al. Better Mixing via Deep Representations , 2012, ICML.
[598] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[599] Sida I. Wang,et al. Dropout Training as Adaptive Regularization , 2013, NIPS.
[600] Srinivas C. Turaga,et al. Connectomic reconstruction of the inner plexiform layer in the mouse retina , 2013, Nature.
[601] Benjamin Schrauwen,et al. Training energy-based models for time-series imputation , 2013, J. Mach. Learn. Res..
[602] Sumit Basu,et al. Teaching Classification Boundaries to Humans , 2013, AAAI.
[603] Jason Weston,et al. A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.
[604] Christopher D. Manning,et al. Fast dropout training , 2013, ICML.
[605] Yoshua Bengio,et al. Scaling Up Spike-and-Slab Models for Unsupervised Feature Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[606] Benjamin Schrauwen,et al. Deep content-based music recommendation , 2013, NIPS.
[607] Yoshua Bengio,et al. Texture Modeling with Convolutional Spike-and-Slab RBMs and Deep Extensions , 2012, AISTATS.
[608] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .
[609] Oswin Krause,et al. Approximation properties of DBNs with binary hidden units and real-valued visible units , 2013, ICML.
[610] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.
[611] Razvan Pascanu,et al. Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.
[612] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.
[613] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[614] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.
[615] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[616] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.
[617] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[618] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.
[619] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.
[620] Yoshua Bengio,et al. The Spike-and-Slab RBM and Extensions to Discrete and Sparse Data Distributions , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[621] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[622] Ronan Collobert,et al. Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.
[623] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[624] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[625] Zhen Wang,et al. Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.
[626] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[627] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[628] Yoshua Bengio,et al. What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..
[629] Jason Weston,et al. Question Answering with Subgraph Embeddings , 2014, EMNLP.
[630] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[631] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.
[632] Franz Pernkopf,et al. General Stochastic Networks for Classification , 2014, NIPS.
[633] David Sussillo,et al. Random Walks: Training Very Deep Nonlinear Feed-Forward Networks with Smart Initialization , 2014, ArXiv.
[634] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[635] Yoshua Bengio,et al. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.
[636] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[637] James Martens,et al. On the Expressive Efficiency of Sum Product Networks , 2014, ArXiv.
[638] Jasper Snoek,et al. Freeze-Thaw Bayesian Optimization , 2014, ArXiv.
[639] Razvan Pascanu,et al. How to Construct Deep Recurrent Neural Networks , 2013, ICLR.
[640] Jürgen Schmidhuber,et al. A Clockwork RNN , 2014, ICML.
[641] Phil Blunsom,et al. Learning Bilingual Word Representations by Marginalizing Alignments , 2014, ACL.
[642] Parul Parashar,et al. Neural Networks in Machine Learning , 2014 .
[643] P. Baldi,et al. Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.
[644] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[645] Yoshua Bengio,et al. Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.
[646] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.
[647] Daan Wierstra,et al. Deep AutoRegressive Networks , 2013, ICML.
[648] Yoshua Bengio,et al. An empirical analysis of dropout in piecewise linear networks , 2013, ICLR.
[649] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[650] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[651] Hugo Larochelle,et al. A Deep and Tractable Density Estimator , 2013, ICML.
[652] Guido Montúfar,et al. Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units , 2013, Neural Computation.
[653] Daniel L. K. Yamins,et al. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..
[654] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[655] Tom Schaul,et al. Unit Tests for Stochastic Optimization , 2013, ICLR.
[656] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[657] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[658] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[659] Max Welling,et al. Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets , 2014, ICML.
[660] W. Karush. Minima of Functions of Several Variables with Inequalities as Side Conditions , 2014 .
[661] Matthias Bethge,et al. How close are we to understanding image-based saliency? , 2014, ArXiv.
[662] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[663] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[664] Yaroslav Bulatov,et al. Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.
[665] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[666] Razvan Pascanu,et al. On the number of inference regions of deep feed forward networks with piece-wise linear activations , 2013, ICLR.
[667] Tapani Raiko,et al. Iterative Neural Autoregressive Distribution Estimator NADE-k , 2014, NIPS.
[668] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[669] Christian Osendorfer,et al. Learning Stochastic Recurrent Networks , 2014, NIPS 2014.
[670] Brendan J. Frey,et al. Deep learning of the tissue-regulated splicing code , 2014, Bioinform..
[671] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[672] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[673] Surya Ganguli,et al. Analyzing noise in autoencoders and deep networks , 2014, ArXiv.
[674] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[675] Balázs Kégl,et al. The Higgs boson machine learning challenge , 2014, HEPML@NIPS.
[676] Zhen Wang,et al. Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.
[677] Dong Yu,et al. Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..
[678] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.
[679] Jian Zhou,et al. Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction , 2014, ICML.
[680] Roland Memisevic,et al. The Potential Energy of an Autoencoder , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[681] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.
[682] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[683] Ferenc Huszar,et al. How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.
[684] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[685] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.
[686] Nadav Cohen,et al. On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.
[687] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.
[688] Zoubin Ghahramani,et al. Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.
[689] Grzegorz Chrupala,et al. Learning language through pictures , 2015, ACL.
[690] Philip Bachman,et al. Variational Generative Stochastic Networks with Collaborative Shaping , 2015, ICML.
[691] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.
[692] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.
[693] Pierre-Luc Bacon. Conditional computation in neural networks using a decision-theoretic approach , 2015 .
[694] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[695] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[696] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[697] Zoubin Ghahramani,et al. Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.
[698] B. Frey,et al. The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.
[699] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[700] Vadlamani Ravi,et al. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..
[701] Robert P. Sheridan,et al. Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..
[702] Wojciech Zaremba,et al. An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.
[703] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[704] Misha Denil,et al. From Group to Individual Labels Using Deep Features , 2015, KDD.
[705] Sergey Levine,et al. Learning Visual Feature Spaces for Robotic Manipulation with Deep Spatial Autoencoders , 2015, ArXiv.
[706] Steve Renals,et al. A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition , 2015, INTERSPEECH.
[707] Jason Weston,et al. Memory Networks , 2014, ICLR.
[708] Xavier Bouthillier,et al. Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets , 2014, NIPS.
[709] Koray Kavukcuoglu,et al. Multiple Object Recognition with Visual Attention , 2014, ICLR.
[710] Yoshua Bengio,et al. Reweighted Wake-Sleep , 2014, ICLR.
[711] Yoshua Bengio,et al. ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks , 2015, ArXiv.
[712] Pascal Vincent,et al. GSNs : Generative Stochastic Networks , 2015, ArXiv.
[713] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.
[714] Xiaodong He,et al. A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems , 2015, WWW.
[715] Zhiyuan Liu,et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.
[716] Bolei Zhou,et al. Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.
[717] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.
[718] Jason Morton,et al. When Does a Mixture of Products Contain a Product of Mixtures? , 2012, SIAM J. Discret. Math..
[719] Jason Weston,et al. Weakly Supervised Memory Networks , 2015, ArXiv.
[720] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[721] Hossein Mobahi,et al. A Theoretical Analysis of Optimization by Gaussian Continuation , 2015, AAAI.
[722] Ian J. Goodfellow,et al. On distinguishability criteria for estimating generative models , 2014, ICLR.
[723] Thomas Brox,et al. Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[724] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[725] Gabriel Kreiman,et al. Unsupervised Learning of Visual Structure using Predictive Generative Networks , 2015, ArXiv.
[726] Richard S. Zemel,et al. Generative Moment Matching Networks , 2015, ICML.
[727] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[728] Tapani Raiko,et al. Semi-supervised Learning with Ladder Networks , 2015, NIPS.
[729] Yoshua Bengio,et al. Gated Feedback Recurrent Neural Networks , 2015, ICML.
[730] Tomas Mikolov,et al. Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.
[731] Shin Ishii,et al. Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.
[732] Ronan Collobert,et al. From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[733] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[734] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[735] Yoshua Bengio,et al. Low precision arithmetic for deep learning , 2014, ICLR.
[736] Yoshua Bengio,et al. NICE: Non-linear Independent Components Estimation , 2014, ICLR.
[737] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[738] Jürgen Schmidhuber,et al. Highway Networks , 2015, ArXiv.
[739] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[740] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[741] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
[742] Yoshua Bengio,et al. Early Inference in Energy-Based Models Approximates Back-Propagation , 2015, ArXiv.
[743] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[744] Phil Blunsom,et al. Learning to Transduce with Unbounded Memory , 2015, NIPS.
[745] Zhang Chun-xi. Restricted Boltzmann Machines , 2015 .
[746] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[747] Jonathan Tompson,et al. Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[748] Enrique Herrera-Viedma,et al. Sentiment analysis: A review and comparative analysis of web services , 2015, Inf. Sci..
[749] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[750] Wojciech Zaremba,et al. Reinforcement Learning Neural Turing Machines , 2015, ArXiv.
[751] Thomas S. Huang,et al. An Analysis of Unsupervised Pre-training in Light of Recent Advances , 2014, ICLR.
[752] Yoshua Bengio,et al. Training Bidirectional Helmholtz Machines , 2015 .
[753] Shengen Yan,et al. Deep Image: Scaling up Image Recognition , 2015, ArXiv.
[754] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[755] Zhuowen Tu,et al. Deeply-Supervised Nets , 2014, AISTATS.
[756] Joelle Pineau,et al. Conditional Computation in Neural Networks for faster models , 2015, ArXiv.
[757] Yoshua Bengio,et al. BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.
[758] Jitendra Malik,et al. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[759] Jeffrey Dean,et al. Large-Scale Deep Learning For Building Intelligent Computer Systems , 2016, WSDM.
[760] Yves Grandvalet,et al. Combining Two And Three-Way Embeddings Models for Link Prediction in Knowledge Bases , 2016, J. Artif. Intell. Res..
[761] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[762] Matthias Bethge,et al. A note on the evaluation of generative models , 2015, ICLR.
[763] Ruslan Salakhutdinov,et al. Importance Weighted Autoencoders , 2015, ICLR.
[764] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.
[765] Oriol Vinyals,et al. Multilingual Language Processing From Bytes , 2015, NAACL.
[766] Yoshua Bengio,et al. Knowledge Matters: Importance of Prior Information for Optimization , 2013, J. Mach. Learn. Res..
[767] Alex Graves,et al. Grid Long Short-Term Memory , 2015, ICLR.
[768] Tapani Raiko,et al. Stochastic gradient estimate variance in contrastive divergence and persistent contrastive divergence , 2016, ESANN.
[769] Phillipp Kaestner,et al. Linear And Nonlinear Programming , 2016 .
[770] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[771] Bozhkov Lachezar,et al. Echo State Network , 2017, Encyclopedia of Machine Learning and Data Mining.
[772] A. Hall,et al. Adaptive Switching Circuits , 2016 .
[773] F. Ramsey. Truth and Probability , 2016 .
[774] Jiri Matas,et al. All you need is a good init , 2015, ICLR.
[775] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.
[776] Claudia Biermann,et al. Mathematical Methods Of Statistics , 2016 .
[777] Vishal. A. Kharde,et al. Sentiment Analysis of Twitter Data : A Survey of Techniques , 2016, ArXiv.
[778] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[779] Xiaojin Zhu,et al. Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.
[780] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[781] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.
[782] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .