Yoshua Bengio,et al. Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.
 Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
 Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
 Wojciech Zaremba,et al. An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.
 Jürgen Schmidhuber,et al. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 2004, Machine Learning.
 Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.
 Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2014, ICML.
 Jürgen Schmidhuber. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.
 George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
 Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
 Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .