On the challenge of learning complex functions.

[1]  Sunita Sarawagi Learning with Graphical Models , 2008 .

[2]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[3]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[4]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[5]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[6]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[7]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[8]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[9]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[10]  Nicolas Le Roux,et al.  Convex Neural Networks , 2005, NIPS.

[11]  R. Guillery Is postnatal neocortical maturation hierarchical? , 2005, Trends in Neurosciences.

[12]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Ronald,et al.  Learning representations by backpropagating errors , 2004 .

[15]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[16]  Michael Schmitt,et al.  Descartes' Rule of Signs for Radial Basis Function Neural Networks , 2002, Neural Computation.

[17]  Paul E. Utgoff,et al.  Many-Layered Learning , 2002, Neural Computation.

[18]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  Eric Allender,et al.  Circuit Complexity before the Dawn of the New Millennium , 1996, FSTTCS.

[22]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[23]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[24]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[25]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[26]  Kenji Fukumizu,et al.  Active Learning in Multilayer Perceptrons , 1995, NIPS.

[27]  Gadi Pinkas,et al.  Improving Connectionist Energy Minimization , 1995, J. Artif. Intell. Res..

[28]  Yoshua Bengio,et al.  Diffusion of Context and Credit Information in Markovian Models , 1995, J. Artif. Intell. Res..

[29]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[30]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[31]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[32]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[33]  J. Håstad Computational limitations of small-depth circuits , 1987 .

[34]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[35]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[36]  Miklós Ajtai,et al.  ∑11-Formulae on finite structures , 1983, Ann. Pure Appl. Log..

[37]  P. L. Adams THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .

[38]  David G. Stork,et al.  Pattern Classification , 1973 .