Learning Deep Architectures for AI
暂无分享,去创建一个
[1] H. Hotelling. Analysis of a complex of statistical variables into principal components. , 1933 .
[2] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.
[3] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..
[4] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..
[5] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .
[6] C. S. Wallace,et al. An Information Measure for Classification , 1968, Comput. J..
[7] F. O'connor. Energy Budget , 1971, Nature.
[8] J. Piaget,et al. The Origins of Intelligence in Children , 1971 .
[9] J. M. Hammersley,et al. Markov fields on finite graphs and lattices , 1971 .
[10] Hans Hermes,et al. Introduction to mathematical logic , 1973, Universitext.
[11] M. A. Griffin,et al. Information Processing Systems , 1976 .
[12] James L. McClelland,et al. An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .
[13] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[14] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..
[16] A. Yao. Separating the polynomial-time hierarchy by oracles , 1985 .
[17] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[18] L. Brown. Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .
[19] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .
[20] Rajesh Sharma,et al. Asymptotic analysis , 1986 .
[21] Johan Håstad,et al. Almost optimal lower bounds for small depth circuits , 1986, STOC '86.
[22] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .
[23] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .
[24] S. Duane,et al. Hybrid Monte Carlo , 1987 .
[25] Elliott Mendelson,et al. Introduction to mathematical logic (3. ed.) , 1987 .
[26] Ingo Wegener,et al. The complexity of Boolean functions , 1987 .
[27] Yann LeCun,et al. Memoires associatives distribuees: Une comparaison (Distributed associative memories: A comparison) , 1987 .
[28] James L. McClelland,et al. Explorations in parallel distributed processing: a handbook of models, programs, and exercises , 1988 .
[29] J. Stephen Judd,et al. Learning in neural networks , 1988, COLT '88.
[30] James L. McClelland. Explorations In Parallel Distributed Processing , 1988 .
[31] Geoffrey E. Hinton,et al. Learning distributed representations of concepts. , 1989 .
[32] Geoffrey E. Hinton,et al. Parallel Models of Associative Memory , 1989 .
[33] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[34] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.
[35] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..
[36] E. Allgower,et al. Numerical Continuation Methods , 1990 .
[37] W. Pitts,et al. A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.
[38] Jordan B. Pollack,et al. Recursive Distributed Representations , 1990, Artif. Intell..
[39] Eugene L. Allgower,et al. Numerical continuation methods - an introduction , 1990, Springer series in computational mathematics.
[40] Risto Miikkulainen,et al. Natural Language Processing With Modular PDP Networks and Distributed Lexicon , 1991, Cogn. Sci..
[41] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[42] David Haussler,et al. Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.
[43] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.
[44] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.
[45] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.
[46] Radford M. Neal. Connectionist Learning of Belief Networks , 1992, Artif. Intell..
[47] Geoffrey E. Hinton,et al. Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.
[48] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.
[49] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.
[50] Maurice Milgram,et al. Transformation Invariant Autoassociation with Application to Handwritten Character Recognition , 1994, NIPS.
[51] David A. Cohn,et al. Active Learning with Statistical Models , 1996, NIPS.
[52] Pekka Orponen,et al. Computational complexity of neural networks: a survey , 1994 .
[53] G. Kane. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .
[54] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[55] Peter Tiňo,et al. Learning long-term dependencies is not as difficult with NARX recurrent neural networks , 1995 .
[56] Carl E. Rasmussen,et al. In Advances in Neural Information Processing Systems , 2011 .
[57] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.
[58] Sebastian Thrun,et al. Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.
[59] J. J. Moré,et al. Global continuation for distance geometry problems , 1995 .
[60] Geoffrey E. Hinton,et al. The Helmholtz Machine , 1995, Neural Computation.
[61] Geoffrey E. Hinton,et al. The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.
[62] Jonathan Baxter,et al. Learning internal representations , 1995, COLT '95.
[63] J. J. Moré,et al. Smoothing techniques for macromolecular global optimization , 1995 .
[64] Zhi-jun Wu. Global Continuation for Distance Geometry Problems Global Continuation for Distance Geometry Problems , 1995 .
[65] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .
[66] Michael I. Jordan,et al. Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..
[67] Nathan Intrator,et al. How to Make a Low-Dimensional Representation Suitable for Diverse Tasks , 1996 .
[68] Larry A. Rendell,et al. Learning Despite Concept Variation by Finding Structure in Attribute-based Data , 1996, ICML.
[69] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.
[70] Barak A. Pearlmutter,et al. A Context-Sensitive Generalization of ICA , 1996 .
[71] Thomas F. Coleman,et al. Parallel continuation-based global optimization for molecular conformation and protein folding , 1994, J. Glob. Optim..
[72] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.
[73] Larry A. Rendell,et al. Global Data Analysis and the Fragmentation Problem in Decision Tree Induction , 1997, ECML.
[74] William I. Gasarch,et al. Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.
[75] Geoffrey E. Hinton,et al. Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.
[76] Paul M. B. Vitányi,et al. An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.
[77] H. Sebastian Seung,et al. Learning Continuous Attractors in Recurrent Networks , 1997, NIPS.
[78] Jorge J. Moré,et al. Global Continuation for Distance Geometry Problems , 1995, SIAM J. Optim..
[79] Terrence J. Sejnowski,et al. Learning Nonlinear Overcomplete Representations for Efficient Coding , 1997, NIPS.
[80] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[81] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.
[82] Michael I. Jordan. Learning in Graphical Models , 1999, NATO ASI Series.
[83] Jorma Rissanen,et al. Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.
[84] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[85] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.
[86] Dima Grigoriev,et al. Complexity Lower Bounds for Approximation Algebraic Computation Trees , 1999, J. Complex..
[87] B. Schölkopf,et al. Advances in kernel methods: support vector learning , 1999 .
[88] Gunnar Rätsch,et al. Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.
[89] Terrence J. Sejnowski,et al. Unsupervised Learning , 2018, Encyclopedia of GIS.
[90] Yair Weiss,et al. Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.
[91] Pietro Perona,et al. Unsupervised Learning of Models for Recognition , 2000, ECCV.
[92] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.
[93] Nathalie Japkowicz,et al. Nonlinear Autoassociation Is Not Equivalent to PCA , 2000, Neural Computation.
[94] Terrence J. Sejnowski,et al. Learning Overcomplete Representations , 2000, Neural Computation.
[95] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[96] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[97] Geoffrey E. Hinton,et al. Extracting distributed representations of concepts and relations from positive and negative propositions , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.
[98] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.
[99] N. Cristianini,et al. On Kernel-Target Alignment , 2001, NIPS.
[100] E. Oja,et al. Independent Component Analysis , 2013 .
[101] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.
[102] S. Laughlin,et al. An Energy Budget for Signaling in the Grey Matter of the Brain , 2001, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.
[103] Yee Whye Teh,et al. A New View of ICA , 2001 .
[104] Lei Wang,et al. Learning kernel parameters by using class separability measure , 2002 .
[105] Geoffrey E. Hinton,et al. Self Supervised Boosting , 2002, NIPS.
[106] Mikhail Belkin,et al. Using manifold structure for partially labelled classification , 2002, NIPS 2002.
[107] Paul E. Utgoff,et al. Many-Layered Learning , 2002, Neural Computation.
[108] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.
[109] Clifton B. Chadwick. What is learning , 2002 .
[110] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[111] Thomas G. Dietterich,et al. Editors. Advances in Neural Information Processing Systems , 2002 .
[112] Matthew Brand,et al. Charting a Manifold , 2002, NIPS.
[113] Jean-Luc Gauvain,et al. Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[114] Michael Schmitt,et al. Descartes' Rule of Signs for Radial Basis Function Neural Networks , 2002, Neural Computation.
[115] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.
[116] Ahmad Emami,et al. Training Connectionist Models for the Structured Language Model , 2003, EMNLP.
[117] Bernhard Schölkopf,et al. Learning with Local and Global Consistency , 2003, NIPS.
[118] Tai Sing Lee,et al. Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.
[119] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..
[120] Thomas Gärtner,et al. A survey of kernels for structured data , 2003, SKDD.
[121] Yee Whye Teh,et al. Energy-Based Models for Sparse Overcomplete Representations , 2003, J. Mach. Learn. Res..
[122] P. Lennie. The Cost of Cortical Computation , 2003, Current Biology.
[123] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[124] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..
[125] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[126] Jonathan Baxter,et al. A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.
[127] Geoffrey E. Hinton,et al. Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.
[128] G. Peterson. A day of great illumination: B. F. Skinner's discovery of shaping. , 2004, Journal of the experimental analysis of behavior.
[129] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[130] Nando de Freitas,et al. An Introduction to MCMC for Machine Learning , 2004, Machine Learning.
[131] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.
[132] Mehryar Mohri,et al. Rational Kernels: Theory and Algorithms , 2004, J. Mach. Learn. Res..
[133] Robert Tibshirani,et al. The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..
[134] Mikhail Belkin,et al. Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.
[135] Nicolas Le Roux,et al. Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.
[136] H. Schwenk,et al. Efficient training of large neural networks for language modeling , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).
[137] Yoshua Bengio,et al. Non-Local Manifold Tangent Learning , 2004, NIPS.
[138] Samy Bengio,et al. Links between perceptrons, MLPs and SVMs , 2004, ICML.
[139] Y. LeCun,et al. Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..
[140] H. Bourlard,et al. Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.
[141] R. Guillery. Is postnatal neocortical maturation hierarchical? , 2005, Trends in Neurosciences.
[142] Marcus Hutter. Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.
[143] Johan Håstad,et al. On the power of small-depth threshold circuits , 1991, computational complexity.
[144] L. Bottou,et al. Training Invariant Support Vector Machines using Selective Sampling , 2005 .
[145] Nicolas Le Roux,et al. Convex Neural Networks , 2005, NIPS.
[146] Nicolas Le Roux,et al. Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.
[147] Jean-Luc Gauvain,et al. Building continuous space language models for transcribing european languages , 2005, INTERSPEECH.
[148] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[149] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..
[150] Emmanuel J. Candès,et al. Decoding by linear programming , 2005, IEEE Transactions on Information Theory.
[151] Nicolas Le Roux,et al. The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.
[152] Michael S. Lewicki,et al. A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels , 2005, NIPS.
[153] Miguel Á. Carreira-Perpiñán,et al. On Contrastive Divergence Learning , 2005, AISTATS.
[154] Yann LeCun,et al. Loss Functions for Discriminative Training of Energy-Based Models , 2005, AISTATS.
[155] Brian Hazlehurst,et al. How to invent a lexicon: the development of shared symbols in interaction , 2006 .
[156] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.
[157] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[158] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[159] Geoffrey E. Hinton,et al. Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.
[160] Fu Jie Huang,et al. A Tutorial on Energy-Based Learning , 2006 .
[161] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[162] Yoshua Bengio,et al. Nonlocal Estimation of Manifold Structure , 2006, Neural Computation.
[163] Marc'Aurelio Ranzato,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.
[164] Yee Whye Teh,et al. Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation , 2006, Cogn. Sci..
[165] Rajat Raina,et al. Efficient sparse coding algorithms , 2006, NIPS.
[166] Tom Minka,et al. Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[167] Max Welling Donald,et al. Products of Experts , 2007 .
[168] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.
[169] Roger B. Grosse,et al. Shift-Invariance Sparse Coding for Audio Classification , 2007, UAI.
[170] Aapo Hyvärinen,et al. Connections Between Score Matching, Contrastive Divergence, and Pseudolikelihood for Continuous-Valued Variables , 2007, IEEE Transactions on Neural Networks.
[171] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.
[172] David G. Lowe,et al. University of British Columbia. , 1945, Canadian Medical Association journal.
[173] Antonio Torralba,et al. Describing Visual Scenes Using Transformed Objects and Parts , 2008, International Journal of Computer Vision.
[174] Geoffrey E. Hinton,et al. Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[175] Ivan Titov,et al. Constituent Parsing with Incremental Sigmoid Belief Networks , 2007, ACL.
[176] Marc'Aurelio Ranzato,et al. A Unified Energy-Based Framework for Unsupervised Learning , 2007, AISTATS.
[177] Yann LeCun,et al. A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).
[178] Marc'Aurelio Ranzato,et al. Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.
[179] Geoffrey E. Hinton,et al. To recognize shapes, first learn to generate images. , 2007, Progress in brain research.
[180] Jason Weston,et al. Large-scale kernel machines , 2007 .
[181] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .
[182] Juan Carlos Niebles,et al. A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[183] Geoffrey E. Hinton,et al. Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.
[184] Thomas Serre,et al. A quantitative theory of immediate visual recognition. , 2007, Progress in brain research.
[185] Geoffrey E. Hinton,et al. Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.
[186] Rajat Raina,et al. Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.
[187] Yoshua Bengio,et al. An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.
[188] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.
[189] Geoffrey E. Hinton,et al. Learning Multilevel Distributed Representations for High-Dimensional Sequences , 2007, AISTATS.
[190] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[191] Katherine A. Heller,et al. A Nonparametric Bayesian Approach to Modeling Overlapping Clusters , 2007, AISTATS.
[192] Aapo Hyvärinen,et al. Some extensions of score matching , 2007, Comput. Stat. Data Anal..
[193] Geoffrey E. Hinton,et al. Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.
[194] Alex Bateman,et al. An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.
[195] Aapo Hyvärinen,et al. A Two-Layer ICA-Like Model Estimated by Score Matching , 2007, ICANN.
[196] Yann LeCun,et al. Deep belief net learning in a long-range vision system for autonomous off-road driving , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[197] Ruslan Salakhutdinov,et al. On the quantitative analysis of deep belief networks , 2008, ICML '08.
[198] Yihong Gong,et al. Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.
[199] Marc'Aurelio Ranzato,et al. Semi-supervised learning of compact document representations with deep networks , 2008, ICML '08.
[200] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[201] Nicolas Le Roux,et al. Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.
[202] Ruslan Salakhutdinov,et al. Evaluating probabilities under high-dimensional latent variable models , 2008, NIPS.
[203] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[204] Tijmen Tieleman,et al. Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.
[205] Michael I. Jordan,et al. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators , 2008, ICML '08.
[206] Jason Weston,et al. Deep learning via semi-supervised embedding , 2008, ICML '08.
[207] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.
[208] Katherine A. Heller,et al. Statistical models for partial membership , 2008, ICML '08.
[209] Antonio Torralba,et al. Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[210] Guillermo Sapiro,et al. Supervised Dictionary Learning , 2008, NIPS.
[211] Yoshua Bengio,et al. Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.
[212] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.
[213] Botond Cseke,et al. Advances in Neural Information Processing Systems 20 (NIPS 2007) , 2008 .
[214] David M. Bradley,et al. Differentiable Sparse Coding , 2008, NIPS.
[215] Nicolas Pinto,et al. Establishing Good Benchmarks and Baselines for Face Recognition , 2008 .
[216] Geoffrey E. Hinton,et al. Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.
[217] Yoshua Bengio,et al. Slow, Decorrelated Features for Pretraining Complex Cell-like Networks , 2009, NIPS.
[218] Yoshua Bengio,et al. Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..
[219] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.
[220] Geoffrey E. Hinton,et al. Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.
[221] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.
[222] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[223] Yoshua Bengio,et al. Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.
[224] Pascal Vincent,et al. The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.
[225] Geoffrey E. Hinton,et al. Semantic hashing , 2009, Int. J. Approx. Reason..
[226] Kai A. Krueger,et al. Flexible shaping: How learning in small steps helps , 2009, Cognition.
[227] Hossein Mobahi,et al. Deep learning from temporal coherence in video , 2009, ICML '09.