Neural Networks Regularization Through Representation Learning

Les modeles de reseaux de neurones et en particulier les modeles profonds sont aujourd'hui l'un des modeles a l'etat de l'art en apprentissage automatique et ses applications. Les reseaux de neurones profonds recents possedent de nombreuses couches cachees ce qui augmente significativement le nombre total de parametres. L'apprentissage de ce genre de modeles necessite donc un grand nombre d'exemples etiquetes, qui ne sont pas toujours disponibles en pratique. Le sur-apprentissage est un des problemes fondamentaux des reseaux de neurones, qui se produit lorsque le modele apprend par coeur les donnees d'apprentissage, menant a des difficultes a generaliser sur de nouvelles donnees. Le probleme du sur-apprentissage des reseaux de neurones est le theme principal aborde dans cette these. Dans la litterature, plusieurs solutions ont ete proposees pour remedier a ce probleme, tels que l'augmentation de donnees, l'arret premature de l'apprentissage ("early stopping"), ou encore des techniques plus specifiques aux reseaux de neurones comme le "dropout" ou la "batch normalization". Dans cette these, nous abordons le sur-apprentissage des reseaux de neurones profonds sous l'angle de l'apprentissage de representations, en considerant l'apprentissage avec peu de donnees. Pour aboutir a cet objectif, nous avons propose trois differentes contributions. La premiere contribution, presentee dans le chapitre 2, concerne les problemes a sorties structurees dans lesquels les variables de sortie sont a grande dimension et sont generalement liees par des relations structurelles. Notre proposition vise a exploiter ces relations structurelles en les apprenant de maniere non-supervisee avec des autoencodeurs. Nous avons valide notre approche sur un probleme de regression multiple appliquee a la detection de points d'interet dans des images de visages. Notre approche a montre une acceleration de l'apprentissage des reseaux et une amelioration de leur generalisation. La deuxieme contribution, presentee dans le chapitre 3, exploite la connaissance a priori sur les representations a l'interieur des couches cachees dans le cadre d'une tâche de classification. Cet a priori est base sur la simple idee que les exemples d'une meme classe doivent avoir la meme representation interne. Nous avons formalise cet a priori sous la forme d'une penalite que nous avons rajoutee a la fonction de perte. Des experimentations empiriques sur la base MNIST et ses variantes ont montre des ameliorations dans la generalisation des reseaux de neurones, particulierement dans le cas ou peu de donnees d'apprentissage sont utilisees. Notre troisieme et derniere contribution, presentee dans le chapitre 4, montre l'interet du transfert d'apprentissage ("transfer learning") dans des applications dans lesquelles peu de donnees d'apprentissage sont disponibles. L'idee principale consiste a pre-apprendre les filtres d'un reseau a convolution sur une tâche source avec une grande base de donnees (ImageNet par exemple), pour les inserer par la suite dans un nouveau reseau sur la tâche cible. Dans le cadre d'une collaboration avec le centre de lutte contre le cancer "Henri Becquerel de Rouen", nous avons construit un systeme automatique base sur ce type de transfert d'apprentissage pour une application medicale ou l'on dispose d’un faible jeu de donnees etiquetees. Dans cette application, la tâche consiste a localiser la troisieme vertebre lombaire dans un examen de type scanner. L’utilisation du transfert d’apprentissage ainsi que de pretraitements et de post traitements adaptes a permis d’obtenir des bons resultats, autorisant la mise en oeuvre du modele en routine clinique.

[1]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[2]  Romain Hérault,et al.  IODA: An input/output deep architecture for image labeling , 2015, Pattern Recognit..

[3]  Kunihiko Fukushima,et al.  Training multi-layered neural network neocognitron , 2013, Neural Networks.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Jürgen Schmidhuber,et al.  Transfer learning for Latin and Chinese characters with Deep Neural Networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[6]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[7]  Stephen Cox,et al.  RecNorm: Simultaneous Normalisation and Classification Applied to Speech Recognition , 1990, NIPS.

[8]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[9]  Mounim A. El-Yacoubi,et al.  A Statistical Approach for Phrase Location and Recognition within a Text Line: An Application to Street Name Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Yongxin Yang,et al.  Trace Norm Regularised Deep Multi-Task Learning , 2016, ICLR.

[11]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[12]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[13]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[14]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[15]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[16]  George A. Anastassiou,et al.  Intelligent Systems II: Complete Approximation by Neural Network Operators , 2015, Studies in Computational Intelligence.

[17]  James P. Reilly,et al.  Minimizing Nonconvex Functions for Sparse Vector Reconstruction , 2010, IEEE Transactions on Signal Processing.

[18]  Jianhua Wang,et al.  Coupling CRFs and Deformable Models for 3D Medical Image Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Joos Vandewalle,et al.  Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications , 2012 .

[20]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  A. Gamba,et al.  Further experiments with PAPA , 1961 .

[23]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[24]  Surya Ganguli,et al.  Analyzing noise in autoencoders and deep networks , 2014, ArXiv.

[25]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[26]  T R Miller,et al.  Three-dimensional display in nuclear medicine and radiology. , 1991, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.

[27]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[28]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[29]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[30]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[31]  Úlfar Erlingsson,et al.  The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets , 2018, ArXiv.

[32]  Kunihiko Fukushima,et al.  Increasing robustness against background noise: Visual pattern recognition by a neocognitron , 2011, Neural Networks.

[33]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[35]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[36]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[37]  G. Marcus The Algebraic Mind: Integrating Connectionism and Cognitive Science , 2001 .

[38]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[39]  Y. Ye,et al.  Lower Bound Theory of Nonzero Entries in Solutions of ℓ2-ℓp Minimization , 2010, SIAM J. Sci. Comput..

[40]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[41]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[42]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[43]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[44]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[45]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[46]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.

[47]  Viren Jain,et al.  Deep and Wide Multiscale Recursive Networks for Robust Image Labeling , 2013, ICLR.

[48]  Satomi Teraoka,et al.  [Three Dimensional Display in Nuclear Medicine]. , 2015, Igaku butsuri : Nihon Igaku Butsuri Gakkai kikanshi = Japanese journal of medical physics : an official journal of Japan Society of Medical Physics.

[49]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[50]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[51]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[52]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[53]  Mikel Olazaran,et al.  A Sociological Study of the Official History of the Perceptrons Controversy , 1996 .

[54]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[55]  Christopher M. Bishop,et al.  Regularization and complexity control in feed-forward networks , 1995 .

[56]  Stephan Günnemann,et al.  Introduction to Tensor Decompositions and their Applications in Machine Learning , 2017, ArXiv.

[57]  Henry S. Baird,et al.  Document image defect models , 1995 .

[58]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[59]  Loris Nanni,et al.  Local binary patterns variants as texture descriptors for medical image analysis , 2010, Artif. Intell. Medicine.

[60]  Ronald M. Summers,et al.  Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning , 2016, IEEE Transactions on Medical Imaging.

[61]  Yinyu Ye,et al.  A note on the complexity of Lp minimization , 2011, Math. Program..

[62]  Mark Craven,et al.  Learning Hidden Markov Models for Regression using Path Aggregation , 2008, UAI.

[63]  Xiaohui Zhang,et al.  Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.

[64]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[65]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[66]  Anders Søgaard,et al.  Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[67]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[68]  Tomaso Poggio,et al.  Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[69]  Andrew M. Keenan Cardiovascular Nuclear Medicine and MRI: Quantitation and Clinical Applications , 1992 .

[70]  Alekseĭ Grigorʹevich Ivakhnenko,et al.  Cybernetics and forecasting techniques , 1967 .

[71]  Shang-Hong Lai,et al.  Learning-Based Vertebra Detection and Iterative Normalized-Cut Segmentation for Spinal MRI , 2009, IEEE Transactions on Medical Imaging.

[72]  Geoffrey E. Hinton,et al.  Massively Parallel Architectures for AI: NETL, Thistle, and Boltzmann Machines , 1983, AAAI.

[73]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[74]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[75]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[76]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[77]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[78]  M.H. Hassoun,et al.  Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[79]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[80]  Razvan Pascanu,et al.  Natural Neural Networks , 2015, NIPS.

[81]  Ting Yu Incorporating prior domain knowledge into inductive machine learning : its implementation in contemporary capital markets , 2007 .

[82]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[83]  Andrew Zisserman,et al.  Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[84]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[85]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[86]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[87]  Fred A. Hamprecht,et al.  Multi-modal Brain Tumor Segmentation using Deep Convolutional Neural Networks , 2014 .

[88]  Alberto Del Bimbo,et al.  Socializing the Semantic Gap , 2015, ACM Comput. Surv..

[89]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[90]  Romain Modzelewski,et al.  A higher body mass index and fat mass are factors predictive of docetaxel dose intensity. , 2013, Anticancer research.

[91]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[92]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[93]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .

[94]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[95]  Rohini K. Srihari,et al.  Incorporating prior knowledge with weighted margin support vector machines , 2004, KDD.

[96]  M. Fridman Hidden Markov model regression , 1993 .

[97]  Shafiq R. Joty,et al.  Sleep Quality Prediction From Wearable Data Using Deep Learning , 2016, JMIR mHealth and uHealth.

[98]  Karl-Georg Steffens The history of approximation theory : from Euler to Bernstein , 2006 .

[99]  Quoc V. Le,et al.  Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.

[100]  A Unified Neural Based Model for Structured Output Problems , 2015 .

[101]  Vipin Chaudhary,et al.  Automatic lumbar vertebra segmentation from clinical CT for wedge compression fracture diagnosis , 2011, Medical Imaging.

[102]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[103]  Peter Grünwald,et al.  A tutorial introduction to the minimum description length principle , 2004, ArXiv.

[104]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[105]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[106]  Nicolas Courty,et al.  Wasserstein discriminant analysis , 2016, Machine Learning.

[107]  Nikos Paragios,et al.  Automatic inference of articulated spine models in CT images using high-order Markov Random Fields , 2011, Medical Image Anal..

[108]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[109]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[110]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[111]  Yinyu Ye,et al.  An Efficient Algorithm for Minimizing a Sum of p-Norms , 1999, SIAM J. Optim..

[112]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[113]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[114]  Marie desJardins,et al.  Evaluation and selection of biases in machine learning , 1995, Machine Learning.

[115]  F. Jardin,et al.  Sarcopenia is an independent prognostic factor in elderly patients with diffuse large B-cell lymphoma treated with immunochemotherapy , 2014, Leukemia & lymphoma.

[116]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[117]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[118]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[119]  Matthew Lai,et al.  Deep Learning for Medical Image Segmentation , 2015, Deep Learning Applications in Medical Imaging.

[120]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[121]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[122]  Stefanos Zafeiriou,et al.  A Semi-automatic Methodology for Facial Landmark Annotation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[123]  Vicky Goh,et al.  Imaging body composition in cancer patients: visceral obesity, sarcopenia and sarcopenic obesity may impact on clinical outcome , 2015, Insights into Imaging.

[124]  Trevor Cohn,et al.  Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[125]  Yann LeCun,et al.  Toward automatic phenotyping of developing embryos from videos , 2005, IEEE Transactions on Image Processing.

[126]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[127]  Ben Goertzel Are There Deep Reasons Underlying the Pathologies of Today's Deep Learning Algorithms? , 2015, AGI.

[128]  Yuan Qi,et al.  Contextual recognition of hand-drawn diagrams with conditional random fields , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[129]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[130]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[131]  Ming Zhou,et al.  A Recursive Recurrent Neural Network for Statistical Machine Translation , 2014, ACL.

[132]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[133]  Yoshinori Sagisaka,et al.  Phoneme boundary estimation using bidirectional recurrent neural networks and its applications , 1999, Systems and Computers in Japan.

[134]  Hermann Ney,et al.  A convergence analysis of log-linear training and its application to speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[135]  Koby Crammer,et al.  Learning from Multiple Sources , 2006, NIPS.

[136]  Shai Ben-David,et al.  A theoretical framework for learning from a pool of disparate data sources , 2002, KDD.

[137]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[138]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[139]  Xiang Jiang Representational Transfer in Deep Belief Networks , 2015, Canadian Conference on AI.

[140]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[141]  A. E. Bryson,et al.  A Steepest-Ascent Method for Solving Optimum Programming Problems , 1962 .

[142]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[143]  Yu Cheng,et al.  Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[144]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[145]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[146]  Andrew McCallum,et al.  Structured Prediction Energy Networks , 2015, ICML.

[147]  Romain Hérault,et al.  Deep multi-task learning with evolving weights , 2016, ESANN.

[148]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[149]  Romain Hérault,et al.  Neural Networks Regularization Through Class-wise Invariant Representation Learning , 2017, ArXiv.

[150]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[151]  Yaser S. Abu-Mostafa,et al.  Learning from hints in neural networks , 1990, J. Complex..

[152]  Jun Ma,et al.  Hierarchical segmentation and identification of thoracic vertebra using learning-based edge detection and coarse-to-fine deformable model , 2010, Comput. Vis. Image Underst..

[153]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[154]  Narendra Ahuja,et al.  Learning Recognition and Segmentation Using the Cresceptron , 1997, International Journal of Computer Vision.

[155]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[156]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[157]  Raymond J. Mooney,et al.  Transfer Learning by Mapping with Minimal Target Data , 2008 .

[158]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[159]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[160]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[161]  Yaser S. Abu-Mostafa,et al.  A Method for Learning From Hints , 1992, NIPS.

[162]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[163]  Trevor Darrell,et al.  FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.

[164]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[165]  Lilyana Mihalkova and Raymond Mooney,et al.  Transfer Learning with Markov Logic Networks , 2006 .

[166]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[167]  Amaury Habrard,et al.  PAC-Bayes and domain adaptation , 2017, Neurocomputing.

[168]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[169]  Massimiliano Pontil,et al.  Exploiting Unrelated Tasks in Multi-Task Learning , 2012, AISTATS.

[170]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[171]  Theodore L. Economopoulos,et al.  Geometry-based vs. intensity-based medical image registration: A comparative study on 3D CT data , 2016, Comput. Biol. Medicine.

[172]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[173]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[174]  Ben Glocker,et al.  Vertebrae Localization in Pathological Spine CT via Dense Classification from Sparse Annotations , 2013, MICCAI.

[175]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[176]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[177]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[178]  A. G. Ivakhnenko,et al.  Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[179]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[180]  Umar Syed,et al.  Enzyme function prediction with interpretable models. , 2009, Methods in molecular biology.

[181]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[182]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[183]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[184]  L. Ljung,et al.  Overtraining, regularization and searching for a minimum, with application to neural networks , 1995 .

[185]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[186]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[187]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[188]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[189]  Patrice Y. Simard,et al.  Using GPUs for machine learning algorithms , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[190]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[191]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[192]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[193]  Raymond J. Mooney,et al.  Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.

[194]  Florian Schulze,et al.  Automated landmarking and labeling of fully and partially scanned spinal columns in CT images , 2013, Medical Image Anal..

[195]  George M. Siouris,et al.  Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[196]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[197]  Joachim Bingel,et al.  Sluice networks: Learning what to share between loosely related tasks , 2017, ArXiv.

[198]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[199]  B. Irie,et al.  Capabilities of three-layered perceptrons , 1988, IEEE 1988 International Conference on Neural Networks.

[200]  Geoffrey Zweig,et al.  Joint Language and Translation Modeling with Recurrent Neural Networks , 2013, EMNLP.

[201]  Wangmeng Zuo,et al.  Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[202]  M. M. Hassan Mahmud,et al.  Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations , 2007, NIPS.

[203]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[204]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[205]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[206]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[207]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[208]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[209]  P. Ut Goff,et al.  Machine learning of inductive bias , 1986 .

[210]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[211]  S. C. Suddarth,et al.  Rule-Injection Hints as a Means of Improving Network Performance and Learning Time , 1990, EURASIP Workshop.

[212]  Alex Fridman,et al.  DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning , 2018, ArXiv.

[213]  Rick Chartrand,et al.  Fast algorithms for nonconvex compressive sensing: MRI reconstruction from very few data , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[214]  Alexei A. Efros,et al.  Investigating Human Priors for Playing Video Games , 2018, ICML.

[215]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[216]  Barbara Hammer,et al.  On the approximation capability of recurrent neural networks , 2000, Neurocomputing.

[217]  Pierre Baldi,et al.  Deep autoencoder neural networks for gene ontology annotation predictions , 2014, BCB.

[218]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[219]  R. S-A. Gatsaeva,et al.  On the representation of continuous functions of several variables as superpositions of continuous functions of one variable and addition , 2018 .

[220]  Christopher Malon,et al.  Identifying histological elements with convolutional neural networks , 2008, CSTST.

[221]  Dana Cobzas,et al.  Automated segmentation of muscle and adipose tissue on CT images for human body composition analysis , 2009, Medical Imaging.

[222]  Chris Eliasmith,et al.  Deep networks for robust visual recognition , 2010, ICML.

[223]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[224]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[225]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[226]  Yadong Mu,et al.  Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues , 2017, ArXiv.

[227]  Naftali Tishby,et al.  Incorporating Prior Knowledge on Features into Learning , 2007, AISTATS.

[228]  Tapani Raiko,et al.  Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.

[229]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[230]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[231]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[232]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[233]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[234]  Xinyu Zhang A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA , 2017, ArXiv.

[235]  Philip David,et al.  Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .

[236]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[237]  Nitish Srivastava,et al.  Improving Neural Networks with Dropout , 2013 .

[238]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[239]  W. Light Ridge Functions, Sigmoidal Functions and Neural Networks , 1993 .

[240]  Mike Schuster,et al.  On supervised learning from sequential data with applications for speech regognition , 1999 .

[241]  Sergey Demyanov Regularization methods for neural networks and related models , 2015 .

[242]  F. Agakov,et al.  Application of high-dimensional feature selection: evaluation for genomic prediction in man , 2015, Scientific Reports.

[243]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[244]  David J. Kriegman,et al.  Localizing Parts of Faces Using a Consensus of Exemplars , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[245]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[246]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[247]  Geoffrey E. Hinton,et al.  Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.

[248]  Narendra S. Chaudhari,et al.  Capturing Long-Term Dependencies for Protein Secondary Structure Prediction , 2004, ISNN.

[249]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[250]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[251]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[252]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[253]  Tomaso A. Poggio,et al.  Representation Properties of Networks: Kolmogorov's Theorem Is Irrelevant , 1989, Neural Computation.

[254]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[255]  Yoshua Bengio,et al.  An empirical analysis of dropout in piecewise linear networks , 2013, ICLR.

[256]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[257]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[258]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[259]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[260]  Hermann Ney,et al.  Mean-normalized stochastic gradient for large-scale deep learning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[261]  Emilio Soria Olivas,et al.  Handbook of Research on Machine Learning Applications and Trends : Algorithms , Methods , and Techniques , 2009 .

[262]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[263]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[264]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[265]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[266]  Romain Hérault,et al.  Spotting L3 slice in CT scans using deep convolutional network and transfer learning , 2017, Comput. Biol. Medicine.

[267]  Xingping Sun,et al.  The fundamentality of sets of ridge functions , 1992 .

[268]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[269]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[270]  Peter H Whincup,et al.  Sarcopenic Obesity and Risk of Cardiovascular Disease and Mortality: A Population-Based Cohort Study of Older Men , 2014, Journal of the American Geriatrics Society.

[271]  Christopher Straus,et al.  Comparison of Two Deformable Registration Algorithms in the Presence of Radiologic Change Between Serial Lung CT Scans , 2015, Journal of Digital Imaging.

[272]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[273]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[274]  Dorin Comaniciu,et al.  Spine detection in CT and MR using iterated marginal space learning , 2013, Medical Image Anal..

[275]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[276]  Shai Ben-David,et al.  A notion of task relatedness yielding provable multiple-task learning guarantees , 2008, Machine Learning.

[277]  K Ogawa,et al.  Impact of Sarcopenia on Survival in Patients Undergoing Living Donor Liver Transplantation , 2013, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[278]  Luiz Eduardo Soares de Oliveira,et al.  Writer-independent feature learning for Offline Signature Verification using Deep Convolutional Neural Networks , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[279]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[280]  Firoj Alam,et al.  Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises , 2017, ISCRAM.

[281]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[282]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[283]  Ralf Klinkenberg,et al.  Data Classification: Algorithms and Applications , 2014 .

[284]  Ye Wang,et al.  Improving Content-based and Hybrid Music Recommendation using Deep Learning , 2014, ACM Multimedia.

[285]  Ayse Betül Oktay,et al.  Localization of the Lumbar Discs Using Machine Learning and Exact Probabilistic Inference , 2011, MICCAI.

[286]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[287]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[288]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[289]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[290]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[291]  Aaron C. Courville,et al.  Deep Learning Vector Quantization , 2016, ESANN.

[292]  Mark G van Vledder,et al.  Sarcopenia negatively impacts short-term outcomes in patients undergoing hepatic resection for colorectal liver metastasis. , 2011, HPB : the official journal of the International Hepato Pancreato Biliary Association.

[293]  Yongxin Yang,et al.  Deep Multi-task Representation Learning: A Tensor Factorisation Approach , 2016, ICLR.

[294]  Rishi Bedi,et al.  Deep Reinforcement Learning for Simulated Autonomous Vehicle Control , 2016 .

[295]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[296]  S. Grossberg Contour Enhancement , Short Term Memory , and Constancies in Reverberating Neural Networks , 1973 .

[297]  M. F. Møller,et al.  Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in 0(N) Time , 1993 .

[298]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[299]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[300]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[301]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[302]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[303]  S B Heymsfield,et al.  Cadaver validation of skeletal muscle measurement by magnetic resonance imaging and computerized tomography. , 1998, Journal of applied physiology.

[304]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[305]  Sridhar Mahadevan,et al.  Manifold alignment using Procrustes analysis , 2008, ICML '08.

[306]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[307]  D. S. Jeng,et al.  Self-organizing polynomial neural network for modelling complex hydrological processes , 2005 .

[308]  Ambedkar Dukkipati,et al.  To go deep or wide in learning? , 2014, AISTATS.

[309]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[310]  Ronald M. Summers,et al.  Detection of Sclerotic Spine Metastases via Random Aggregation of Deep Convolutional Neural Network Classifications , 2014, ArXiv.

[311]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[312]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[313]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[314]  Stanley Heshka,et al.  Total body skeletal muscle and adipose tissue volumes: estimation from a single abdominal cross-sectional image. , 2004, Journal of applied physiology.

[315]  Ben Glocker,et al.  Robust Registration of Longitudinal Spine CT , 2014, MICCAI.

[316]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[317]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[318]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[319]  S. Linnainmaa Taylor expansion of the accumulated rounding error , 1976 .

[320]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[321]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[322]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[323]  Jürgen Schmidhuber,et al.  A local learning algorithm for dynamic feedforward and recurrent networks , 1990, Forschungsberichte, TU Munich.

[324]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[325]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[326]  Hayit Greenspan,et al.  Deep learning with non-medical training used for chest pathology identification , 2015, Medical Imaging.

[327]  D. Hansel,et al.  Memorization Without Generalization in a Multilayered Neural Network , 1992 .

[328]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[329]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[330]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[331]  Jürgen Schmidhuber,et al.  My First Deep Learning System of 1991 + Deep Learning Timeline 1962-2013 , 2013, ArXiv.

[332]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[333]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[334]  L. Mccargar,et al.  Cancer cachexia in the age of obesity: skeletal muscle depletion is a powerful prognostic factor, independent of body mass index. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[335]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[336]  E. Thorndike,et al.  The influence of improvement in one mental function upon the efficiency of other functions. (I). , 1901 .

[337]  Traian Rebedea,et al.  Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay , 2016, ArXiv.

[338]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.

[339]  Tony Jan,et al.  VQSVM: A case study for incorporating prior domain knowledge into inductive machine learning , 2010, Neurocomputing.

[340]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[341]  Thomas Brox,et al.  Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[342]  George A. Anastassiou,et al.  Approximation theory - moduli of continuity and global smoothness preservation , 1999 .

[343]  Yaser S. Abu-Mostafa,et al.  Hints and the VC Dimension , 1993, Neural Computation.

[344]  Bruno Stuner,et al.  Cohort of LSTM and lexicon verification for handwriting recognition with gigantic lexicon , 2016, ArXiv.

[345]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[346]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[347]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[348]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[349]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[350]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[351]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[352]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[353]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[354]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[355]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[356]  Thierry Paquet,et al.  A Markovian Approach for Handwritten Document Segmentation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[357]  Pradeep Dubey,et al.  Faster CNNs with Direct Sparse Convolutions and Guided Pruning , 2016, ICLR.

[358]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[359]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[360]  Demetri Terzopoulos,et al.  Deformable models in medical image analysis: a survey , 1996, Medical Image Anal..

[361]  Tegan Maharaj,et al.  Deep Nets Don't Learn via Memorization , 2017, ICLR.

[362]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[363]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[364]  S. Dreyfus The numerical solution of variational problems , 1962 .

[365]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[366]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[367]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[368]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[369]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[370]  Ben Glocker,et al.  Automatic Localization and Identification of Vertebrae in Arbitrary Field-of-View CT Scans , 2012, MICCAI.

[371]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[372]  S. Cameron,et al.  Automatic spine identification in abdominal CT slices using image partition forests , 2009, 2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis.

[373]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[374]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[375]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[376]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[377]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[378]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[379]  Jürgen Schmidhuber,et al.  Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.

[380]  Sebastian Thrun,et al.  Learning One More Thing , 1994, IJCAI.

[381]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[382]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[383]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[384]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[385]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[386]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[387]  Bernard Widrow,et al.  MADALINE RULE II: a training algorithm for neural networks , 1988, ICNN.

[388]  Jerry L Prince,et al.  Current methods in medical image segmentation. , 2000, Annual review of biomedical engineering.

[389]  Fernando Corinto,et al.  CNN-based algorithm for drusen identification , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[390]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[391]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[392]  T. Terlaky On lp programming , 1985 .

[393]  Jianmin Wang,et al.  Learning Multiple Tasks with Deep Relationship Networks , 2015, ArXiv.

[394]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[395]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[396]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[397]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[398]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[399]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[400]  Rina Dechter,et al.  Learning While Searching in Constraint-Satisfaction-Problems , 1986, AAAI.

[401]  Narendra Ahuja,et al.  Cresceptron: a self-organizing neural network which grows adaptively , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[402]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[403]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[404]  Shih-Chii Liu,et al.  Computation with Spikes in a Winner-Take-All Network , 2009, Neural Computation.