Deep Architectures for Baby AI

Motivation: understanding intelligence, building AI, scaling to large scale learning of complex functions

[1]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[4]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[5]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[6]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[7]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[8]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[9]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[10]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  Geoffrey E. Hinton,et al.  Extracting distributed representations of concepts and relations from positive and negative propositions , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[12]  Geoffrey E. Hinton,et al.  Self Supervised Boosting , 2002, NIPS.

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[15]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[16]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[17]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[18]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[19]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[20]  Yee Whye Teh,et al.  Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation , 2006, Cogn. Sci..

[21]  Yoshua Bengio,et al.  Non-Local Manifold Tangent Learning , 2004, NIPS.

[22]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[23]  Thomas G. Dietterich,et al.  Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA , 2001, NIPS.

[24]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[25]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[26]  J. Håstad Computational limitations of small-depth circuits , 1987 .

[27]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[28]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[29]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[30]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[31]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[32]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[33]  Pascal Vincent,et al.  Non-Local Manifold Parzen Windows , 2005, NIPS.

[34]  Brian Hazlehurst,et al.  How to invent a lexicon: the development of shared symbols in interaction , 2006 .

[35]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[36]  P. L. Adams THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .

[37]  Eric Allender,et al.  Circuit Complexity before the Dawn of the New Millennium , 1996, FSTTCS.

[38]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[39]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[40]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[41]  Mahesan Niranjan,et al.  Neural networks and radial basis functions in classifying static speech patterns , 1990 .

[42]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[43]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[44]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[45]  R. Guillery Is postnatal neocortical maturation hierarchical? , 2005, Trends in Neurosciences.

[46]  David J. Spiegelhalter,et al.  Advances in Neural Information Processing Systems 15 (NIPS 2002) , 2002 .

[47]  Garrison W. Cottrell,et al.  Non-Linear Dimensionality Reduction , 1992, NIPS.

[48]  Paul E. Utgoff,et al.  Many-Layered Learning , 2002, Neural Computation.