暂无分享,去创建一个
[1] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.
[2] Romain Hérault,et al. IODA: An input/output deep architecture for image labeling , 2015, Pattern Recognit..
[3] Kunihiko Fukushima,et al. Training multi-layered neural network neocognitron , 2013, Neural Networks.
[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[5] Jürgen Schmidhuber,et al. Transfer learning for Latin and Chinese characters with Deep Neural Networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).
[6] Geoffrey E. Hinton,et al. Matrix capsules with EM routing , 2018, ICLR.
[7] Stephen Cox,et al. RecNorm: Simultaneous Normalisation and Classification Applied to Speech Recognition , 1990, NIPS.
[8] D T Jones,et al. Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.
[9] Mounim A. El-Yacoubi,et al. A Statistical Approach for Phrase Location and Recognition within a Text Line: An Application to Street Name Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[10] Yongxin Yang,et al. Trace Norm Regularised Deep Multi-Task Learning , 2016, ICLR.
[11] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.
[12] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.
[13] Koby Crammer,et al. Analysis of Representations for Domain Adaptation , 2006, NIPS.
[14] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[15] Hal Daumé,et al. Frustratingly Easy Domain Adaptation , 2007, ACL.
[16] George A. Anastassiou,et al. Intelligent Systems II: Complete Approximation by Neural Network Operators , 2015, Studies in Computational Intelligence.
[17] James P. Reilly,et al. Minimizing Nonconvex Functions for Sparse Vector Reconstruction , 2010, IEEE Transactions on Signal Processing.
[18] Jianhua Wang,et al. Coupling CRFs and Deformable Models for 3D Medical Image Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[19] Joos Vandewalle,et al. Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications , 2012 .
[20] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.
[21] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[22] A. Gamba,et al. Further experiments with PAPA , 1961 .
[23] Rick Chartrand,et al. Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.
[24] Surya Ganguli,et al. Analyzing noise in autoencoders and deep networks , 2014, ArXiv.
[25] Bernhard Schölkopf,et al. Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .
[26] T R Miller,et al. Three-dimensional display in nuclear medicine and radiology. , 1991, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.
[27] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[28] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.
[29] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[30] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[31] Úlfar Erlingsson,et al. The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets , 2018, ArXiv.
[32] Kunihiko Fukushima,et al. Increasing robustness against background noise: Visual pattern recognition by a neocognitron , 2011, Neural Networks.
[33] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[34] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.
[35] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[36] Tomas Mikolov,et al. Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.
[37] G. Marcus. The Algebraic Mind: Integrating Connectionism and Cognitive Science , 2001 .
[38] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.
[39] Y. Ye,et al. Lower Bound Theory of Nonzero Entries in Solutions of ℓ2-ℓp Minimization , 2010, SIAM J. Sci. Comput..
[40] Koby Crammer,et al. Learning Bounds for Domain Adaptation , 2007, NIPS.
[41] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..
[42] Yelong Shen,et al. A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.
[43] Rajat Raina,et al. Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.
[44] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[45] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[46] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.
[47] Viren Jain,et al. Deep and Wide Multiscale Recursive Networks for Robust Image Labeling , 2013, ICLR.
[48] Satomi Teraoka,et al. [Three Dimensional Display in Nuclear Medicine]. , 2015, Igaku butsuri : Nihon Igaku Butsuri Gakkai kikanshi = Japanese journal of medical physics : an official journal of Japan Society of Medical Physics.
[49] R. Lippmann,et al. An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.
[50] Charles A. Micchelli,et al. A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.
[51] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.
[52] Sebastian Thrun,et al. Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.
[53] Mikel Olazaran,et al. A Sociological Study of the Official History of the Perceptrons Controversy , 1996 .
[54] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[55] Christopher M. Bishop,et al. Regularization and complexity control in feed-forward networks , 1995 .
[56] Stephan Günnemann,et al. Introduction to Tensor Decompositions and their Applications in Machine Learning , 2017, ArXiv.
[57] Henry S. Baird,et al. Document image defect models , 1995 .
[58] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[59] Loris Nanni,et al. Local binary patterns variants as texture descriptors for medical image analysis , 2010, Artif. Intell. Medicine.
[60] Ronald M. Summers,et al. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning , 2016, IEEE Transactions on Medical Imaging.
[61] Yinyu Ye,et al. A note on the complexity of Lp minimization , 2011, Math. Program..
[62] Mark Craven,et al. Learning Hidden Markov Models for Regression using Path Aggregation , 2008, UAI.
[63] Xiaohui Zhang,et al. Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.
[64] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.
[65] Seunghoon Hong,et al. Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[66] Anders Søgaard,et al. Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.
[67] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[68] Tomaso Poggio,et al. Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.
[69] Andrew M. Keenan. Cardiovascular Nuclear Medicine and MRI: Quantitation and Clinical Applications , 1992 .
[70] Alekseĭ Grigorʹevich Ivakhnenko,et al. Cybernetics and forecasting techniques , 1967 .
[71] Shang-Hong Lai,et al. Learning-Based Vertebra Detection and Iterative Normalized-Cut Segmentation for Spinal MRI , 2009, IEEE Transactions on Medical Imaging.
[72] Geoffrey E. Hinton,et al. Massively Parallel Architectures for AI: NETL, Thistle, and Boltzmann Machines , 1983, AAAI.
[73] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[74] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[75] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.
[76] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.
[77] Albert B Novikoff,et al. ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .
[78] M.H. Hassoun,et al. Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.
[79] Dumitru Erhan,et al. Deep Neural Networks for Object Detection , 2013, NIPS.
[80] Razvan Pascanu,et al. Natural Neural Networks , 2015, NIPS.
[81] Ting Yu. Incorporating prior domain knowledge into inductive machine learning : its implementation in contemporary capital markets , 2007 .
[82] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..
[83] Andrew Zisserman,et al. Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.
[84] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.
[85] Sida I. Wang,et al. Dropout Training as Adaptive Regularization , 2013, NIPS.
[86] Gabriel Peyré,et al. Computational Optimal Transport , 2018, Found. Trends Mach. Learn..
[87] Fred A. Hamprecht,et al. Multi-modal Brain Tumor Segmentation using Deep Convolutional Neural Networks , 2014 .
[88] Alberto Del Bimbo,et al. Socializing the Semantic Gap , 2015, ACM Comput. Surv..
[89] Frank Rosenblatt,et al. PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .
[90] Romain Modzelewski,et al. A higher body mass index and fat mass are factors predictive of docetaxel dose intensity. , 2013, Anticancer research.
[91] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[92] San Cristóbal Mateo,et al. The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .
[93] Henry J. Kelley,et al. Gradient Theory of Optimal Flight Paths , 1960 .
[94] George Trigeorgis,et al. Domain Separation Networks , 2016, NIPS.
[95] Rohini K. Srihari,et al. Incorporating prior knowledge with weighted margin support vector machines , 2004, KDD.
[96] M. Fridman. Hidden Markov model regression , 1993 .
[97] Shafiq R. Joty,et al. Sleep Quality Prediction From Wearable Data Using Deep Learning , 2016, JMIR mHealth and uHealth.
[98] Karl-Georg Steffens. The history of approximation theory : from Euler to Bernstein , 2006 .
[99] Quoc V. Le,et al. Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.
[100] A Unified Neural Based Model for Structured Output Problems , 2015 .
[101] Vipin Chaudhary,et al. Automatic lumbar vertebra segmentation from clinical CT for wedge compression fracture diagnosis , 2011, Medical Imaging.
[102] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[103] Peter Grünwald,et al. A tutorial introduction to the minimum description length principle , 2004, ArXiv.
[104] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.
[105] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.
[106] Nicolas Courty,et al. Wasserstein discriminant analysis , 2016, Machine Learning.
[107] Nikos Paragios,et al. Automatic inference of articulated spine models in CT images using high-order Markov Random Fields , 2011, Medical Image Anal..
[108] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[109] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[110] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.
[111] Yinyu Ye,et al. An Efficient Algorithm for Minimizing a Sum of p-Norms , 1999, SIAM J. Optim..
[112] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[113] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.
[114] Marie desJardins,et al. Evaluation and selection of biases in machine learning , 1995, Machine Learning.
[115] F. Jardin,et al. Sarcopenia is an independent prognostic factor in elderly patients with diffuse large B-cell lymphoma treated with immunochemotherapy , 2014, Leukemia & lymphoma.
[116] Yann LeCun,et al. Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.
[117] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.
[118] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[119] Matthew Lai,et al. Deep Learning for Medical Image Segmentation , 2015, Deep Learning Applications in Medical Imaging.
[120] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.
[121] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.
[122] Stefanos Zafeiriou,et al. A Semi-automatic Methodology for Facial Landmark Annotation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[123] Vicky Goh,et al. Imaging body composition in cancer patients: visceral obesity, sarcopenia and sarcopenic obesity may impact on clinical outcome , 2015, Insights into Imaging.
[124] Trevor Cohn,et al. Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.
[125] Yann LeCun,et al. Toward automatic phenotyping of developing embryos from videos , 2005, IEEE Transactions on Image Processing.
[126] Trevor Hastie,et al. Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .
[127] Ben Goertzel. Are There Deep Reasons Underlying the Pathologies of Today's Deep Learning Algorithms? , 2015, AGI.
[128] Yuan Qi,et al. Contextual recognition of hand-drawn diagrams with conditional random fields , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.
[129] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.
[130] D. Donoho. For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .
[131] Ming Zhou,et al. A Recursive Recurrent Neural Network for Statistical Machine Translation , 2014, ACL.
[132] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[133] Yoshinori Sagisaka,et al. Phoneme boundary estimation using bidirectional recurrent neural networks and its applications , 1999, Systems and Computers in Japan.
[134] Hermann Ney,et al. A convergence analysis of log-linear training and its application to speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[135] Koby Crammer,et al. Learning from Multiple Sources , 2006, NIPS.
[136] Shai Ben-David,et al. A theoretical framework for learning from a pool of disparate data sources , 2002, KDD.
[137] Xiaoou Tang,et al. Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.
[138] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[139] Xiang Jiang. Representational Transfer in Deep Belief Networks , 2015, Canadian Conference on AI.
[140] Luca Rigazio,et al. Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.
[141] A. E. Bryson,et al. A Steepest-Ascent Method for Solving Optimum Programming Problems , 1962 .
[142] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[143] Yu Cheng,et al. Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[144] Dong Yu,et al. Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..
[145] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[146] Andrew McCallum,et al. Structured Prediction Energy Networks , 2015, ICML.
[147] Romain Hérault,et al. Deep multi-task learning with evolving weights , 2016, ESANN.
[148] Timothy F. Cootes,et al. Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.
[149] Romain Hérault,et al. Neural Networks Regularization Through Class-wise Invariant Representation Learning , 2017, ArXiv.
[150] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.
[151] Yaser S. Abu-Mostafa,et al. Learning from hints in neural networks , 1990, J. Complex..
[152] Jun Ma,et al. Hierarchical segmentation and identification of thoracic vertebra using learning-based edge detection and coarse-to-fine deformable model , 2010, Comput. Vis. Image Underst..
[153] Jason Weston,et al. Deep learning via semi-supervised embedding , 2008, ICML '08.
[154] Narendra Ahuja,et al. Learning Recognition and Segmentation Using the Cresceptron , 1997, International Journal of Computer Vision.
[155] Qiang Yang,et al. Boosting for transfer learning , 2007, ICML '07.
[156] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[157] Raymond J. Mooney,et al. Transfer Learning by Mapping with Minimal Target Data , 2008 .
[158] Daniel Dominic Sleator,et al. Parsing English with a Link Grammar , 1995, IWPT.
[159] Max Welling,et al. Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.
[160] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..
[161] Yaser S. Abu-Mostafa,et al. A Method for Learning From Hints , 1992, NIPS.
[162] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..
[163] Trevor Darrell,et al. FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.
[164] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..
[165] Lilyana Mihalkova and Raymond Mooney,et al. Transfer Learning with Markov Logic Networks , 2006 .
[166] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.
[167] Amaury Habrard,et al. PAC-Bayes and domain adaptation , 2017, Neurocomputing.
[168] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[169] Massimiliano Pontil,et al. Exploiting Unrelated Tasks in Multi-Task Learning , 2012, AISTATS.
[170] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[171] Theodore L. Economopoulos,et al. Geometry-based vs. intensity-based medical image registration: A comparative study on 3D CT data , 2016, Comput. Biol. Medicine.
[172] Massimiliano Pontil,et al. Regularized multi--task learning , 2004, KDD.
[173] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[174] Ben Glocker,et al. Vertebrae Localization in Pathological Spine CT via Dense Classification from Sparse Annotations , 2013, MICCAI.
[175] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[176] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.
[177] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[178] A. G. Ivakhnenko,et al. Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..
[179] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[180] Umar Syed,et al. Enzyme function prediction with interpretable models. , 2009, Methods in molecular biology.
[181] Marvin Minsky,et al. Perceptrons: expanded edition , 1988 .
[182] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[183] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[184] L. Ljung,et al. Overtraining, regularization and searching for a minimum, with application to neural networks , 1995 .
[185] Kurt Hornik,et al. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.
[186] Giovanni Soda,et al. Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..
[187] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..
[188] Massimiliano Pontil,et al. Multi-Task Feature Learning , 2006, NIPS.
[189] Patrice Y. Simard,et al. Using GPUs for machine learning algorithms , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).
[190] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[191] Benjamin Schrauwen,et al. Deep content-based music recommendation , 2013, NIPS.
[192] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[193] Raymond J. Mooney,et al. Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.
[194] Florian Schulze,et al. Automated landmarking and labeling of fully and partially scanned spinal columns in CT images , 2013, Medical Image Anal..
[195] George M. Siouris,et al. Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.
[196] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[197] Joachim Bingel,et al. Sluice networks: Learning what to share between loosely related tasks , 2017, ArXiv.
[198] Yoshimasa Tsuruoka,et al. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.
[199] B. Irie,et al. Capabilities of three-layered perceptrons , 1988, IEEE 1988 International Conference on Neural Networks.
[200] Geoffrey Zweig,et al. Joint Language and Translation Modeling with Recurrent Neural Networks , 2013, EMNLP.
[201] Wangmeng Zuo,et al. Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[202] M. M. Hassan Mahmud,et al. Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations , 2007, NIPS.
[203] Jianqing Fan,et al. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .
[204] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .
[205] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[206] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[207] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.
[208] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[209] P. Ut Goff,et al. Machine learning of inductive bias , 1986 .
[210] P. Grünwald. The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .
[211] S. C. Suddarth,et al. Rule-Injection Hints as a Means of Improving Network Performance and Learning Time , 1990, EURASIP Workshop.
[212] Alex Fridman,et al. DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning , 2018, ArXiv.
[213] Rick Chartrand,et al. Fast algorithms for nonconvex compressive sensing: MRI reconstruction from very few data , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.
[214] Alexei A. Efros,et al. Investigating Human Priors for Playing Video Games , 2018, ICML.
[215] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[216] Barbara Hammer,et al. On the approximation capability of recurrent neural networks , 2000, Neurocomputing.
[217] Pierre Baldi,et al. Deep autoencoder neural networks for gene ontology annotation predictions , 2014, BCB.
[218] Marc Peter Deisenroth,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[219] R. S-A. Gatsaeva,et al. On the representation of continuous functions of several variables as superpositions of continuous functions of one variable and addition , 2018 .
[220] Christopher Malon,et al. Identifying histological elements with convolutional neural networks , 2008, CSTST.
[221] Dana Cobzas,et al. Automated segmentation of muscle and adipose tissue on CT images for human body composition analysis , 2009, Medical Imaging.
[222] Chris Eliasmith,et al. Deep networks for robust visual recognition , 2010, ICML.
[223] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .
[224] Lawrence Carin,et al. Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..
[225] David Maxwell Chickering,et al. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.
[226] Yadong Mu,et al. Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues , 2017, ArXiv.
[227] Naftali Tishby,et al. Incorporating Prior Knowledge on Features into Learning , 2007, AISTATS.
[228] Tapani Raiko,et al. Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.
[229] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[230] Martial Hebert,et al. Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[231] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[232] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[233] Jonathan Baxter,et al. A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..
[234] Xinyu Zhang. A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA , 2017, ArXiv.
[235] Philip David,et al. Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .
[236] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[237] Nitish Srivastava,et al. Improving Neural Networks with Dropout , 2013 .
[238] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.
[239] W. Light. Ridge Functions, Sigmoidal Functions and Neural Networks , 1993 .
[240] Mike Schuster,et al. On supervised learning from sequential data with applications for speech regognition , 1999 .
[241] Sergey Demyanov. Regularization methods for neural networks and related models , 2015 .
[242] F. Agakov,et al. Application of high-dimensional feature selection: evaluation for genomic prediction in man , 2015, Scientific Reports.
[243] Emmanuel J. Candès,et al. Decoding by linear programming , 2005, IEEE Transactions on Information Theory.
[244] David J. Kriegman,et al. Localizing Parts of Faces Using a Consensus of Exemplars , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[245] Jiawei Han,et al. Knowledge transfer via multiple model local structure mapping , 2008, KDD.
[246] Shai Ben-David,et al. Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.
[247] Geoffrey E. Hinton,et al. Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.
[248] Narendra S. Chaudhari,et al. Capturing Long-Term Dependencies for Protein Secondary Structure Prediction , 2004, ISNN.
[249] Jürgen Schmidhuber,et al. Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.
[250] Yann LeCun,et al. Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[251] D. Luenberger. Optimization by Vector Space Methods , 1968 .
[252] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[253] Tomaso A. Poggio,et al. Representation Properties of Networks: Kolmogorov's Theorem Is Irrelevant , 1989, Neural Computation.
[254] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[255] Yoshua Bengio,et al. An empirical analysis of dropout in piecewise linear networks , 2013, ICLR.
[256] Paul J. Werbos,et al. Applications of advances in nonlinear sensitivity analysis , 1982 .
[257] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .
[258] Daniela M. Witten,et al. An Introduction to Statistical Learning: with Applications in R , 2013 .
[259] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[260] Hermann Ney,et al. Mean-normalized stochastic gradient for large-scale deep learning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[261] Emilio Soria Olivas,et al. Handbook of Research on Machine Learning Applications and Trends : Algorithms , Methods , and Techniques , 2009 .
[262] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[263] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[264] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[265] Timothy Dozat,et al. Incorporating Nesterov Momentum into Adam , 2016 .
[266] Romain Hérault,et al. Spotting L3 slice in CT scans using deep convolutional network and transfer learning , 2017, Comput. Biol. Medicine.
[267] Xingping Sun,et al. The fundamentality of sets of ridge functions , 1992 .
[268] Yaroslav Bulatov,et al. Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.
[269] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[270] Peter H Whincup,et al. Sarcopenic Obesity and Risk of Cardiovascular Disease and Mortality: A Population-Based Cohort Study of Older Men , 2014, Journal of the American Geriatrics Society.
[271] Christopher Straus,et al. Comparison of Two Deformable Registration Algorithms in the Presence of Radiologic Change Between Serial Lung CT Scans , 2015, Journal of Digital Imaging.
[272] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[273] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[274] Dorin Comaniciu,et al. Spine detection in CT and MR using iterated marginal space learning , 2013, Medical Image Anal..
[275] Marc'Aurelio Ranzato,et al. Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.
[276] Shai Ben-David,et al. A notion of task relatedness yielding provable multiple-task learning guarantees , 2008, Machine Learning.
[277] K Ogawa,et al. Impact of Sarcopenia on Survival in Patients Undergoing Living Donor Liver Transplantation , 2013, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.
[278] Luiz Eduardo Soares de Oliveira,et al. Writer-independent feature learning for Offline Signature Verification using Deep Convolutional Neural Networks , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).
[279] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[280] Firoj Alam,et al. Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises , 2017, ISCRAM.
[281] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[282] Kilian Q. Weinberger,et al. Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.
[283] Ralf Klinkenberg,et al. Data Classification: Algorithms and Applications , 2014 .
[284] Ye Wang,et al. Improving Content-based and Hybrid Music Recommendation using Deep Learning , 2014, ACM Multimedia.
[285] Ayse Betül Oktay,et al. Localization of the Lumbar Discs Using Machine Learning and Exact Probabilistic Inference , 2011, MICCAI.
[286] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[287] Yoshua Bengio,et al. Gated Feedback Recurrent Neural Networks , 2015, ICML.
[288] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.
[289] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[290] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .
[291] Aaron C. Courville,et al. Deep Learning Vector Quantization , 2016, ESANN.
[292] Mark G van Vledder,et al. Sarcopenia negatively impacts short-term outcomes in patients undergoing hepatic resection for colorectal liver metastasis. , 2011, HPB : the official journal of the International Hepato Pancreato Biliary Association.
[293] Yongxin Yang,et al. Deep Multi-task Representation Learning: A Tensor Factorisation Approach , 2016, ICLR.
[294] Rishi Bedi,et al. Deep Reinforcement Learning for Simulated Autonomous Vehicle Control , 2016 .
[295] Kunihiko Fukushima,et al. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..
[296] S. Grossberg. Contour Enhancement , Short Term Memory , and Constancies in Reverberating Neural Networks , 1973 .
[297] M. F. Møller,et al. Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in 0(N) Time , 1993 .
[298] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[299] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[300] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[301] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[302] Jocelyn Sietsma,et al. Creating artificial neural networks that generalize , 1991, Neural Networks.
[303] S B Heymsfield,et al. Cadaver validation of skeletal muscle measurement by magnetic resonance imaging and computerized tomography. , 1998, Journal of applied physiology.
[304] D. Hubel,et al. Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.
[305] Sridhar Mahadevan,et al. Manifold alignment using Procrustes analysis , 2008, ICML '08.
[306] Geoffrey E. Hinton. Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.
[307] D. S. Jeng,et al. Self-organizing polynomial neural network for modelling complex hydrological processes , 2005 .
[308] Ambedkar Dukkipati,et al. To go deep or wide in learning? , 2014, AISTATS.
[309] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[310] Ronald M. Summers,et al. Detection of Sclerotic Spine Metastases via Random Aggregation of Deep Convolutional Neural Network Classifications , 2014, ArXiv.
[311] Robert Hecht-Nielsen,et al. Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.
[312] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.
[313] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[314] Stanley Heshka,et al. Total body skeletal muscle and adipose tissue volumes: estimation from a single abdominal cross-sectional image. , 2004, Journal of applied physiology.
[315] Ben Glocker,et al. Robust Registration of Longitudinal Spine CT , 2014, MICCAI.
[316] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[317] Aaron Roth,et al. The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..
[318] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[319] S. Linnainmaa. Taylor expansion of the accumulated rounding error , 1976 .
[320] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[321] Shiguang Shan,et al. Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.
[322] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[323] Jürgen Schmidhuber,et al. A local learning algorithm for dynamic feedforward and recurrent networks , 1990, Forschungsberichte, TU Munich.
[324] Edwin V. Bonilla,et al. Multi-task Gaussian Process Prediction , 2007, NIPS.
[325] Neil D. Lawrence,et al. Learning to learn with the informative vector machine , 2004, ICML.
[326] Hayit Greenspan,et al. Deep learning with non-medical training used for chest pathology identification , 2015, Medical Imaging.
[327] D. Hansel,et al. Memorization Without Generalization in a Multilayered Neural Network , 1992 .
[328] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[329] T. Munich,et al. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.
[330] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[331] Jürgen Schmidhuber,et al. My First Deep Learning System of 1991 + Deep Learning Timeline 1962-2013 , 2013, ArXiv.
[332] Masashi Sugiyama,et al. Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..
[333] Dumitru Erhan,et al. Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[334] L. Mccargar,et al. Cancer cachexia in the age of obesity: skeletal muscle depletion is a powerful prognostic factor, independent of body mass index. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.
[335] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[336] E. Thorndike,et al. The influence of improvement in one mental function upon the efficiency of other functions. (I). , 1901 .
[337] Traian Rebedea,et al. Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay , 2016, ArXiv.
[338] Dana H. Ballard,et al. Modular Learning in Neural Networks , 1987, AAAI.
[339] Tony Jan,et al. VQSVM: A case study for incorporating prior domain knowledge into inductive machine learning , 2010, Neurocomputing.
[340] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[341] Thomas Brox,et al. Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[342] George A. Anastassiou,et al. Approximation theory - moduli of continuity and global smoothness preservation , 1999 .
[343] Yaser S. Abu-Mostafa,et al. Hints and the VC Dimension , 1993, Neural Computation.
[344] Bruno Stuner,et al. Cohort of LSTM and lexicon verification for handwriting recognition with gigantic lexicon , 2016, ArXiv.
[345] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[346] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[347] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.
[348] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.
[349] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[350] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[351] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.
[352] Rob Fergus,et al. Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.
[353] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[354] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.
[355] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[356] Thierry Paquet,et al. A Markovian Approach for Handwritten Document Segmentation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).
[357] Pradeep Dubey,et al. Faster CNNs with Direct Sparse Convolutions and Guided Pruning , 2016, ICLR.
[358] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.
[359] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[360] Demetri Terzopoulos,et al. Deformable models in medical image analysis: a survey , 1996, Medical Image Anal..
[361] Tegan Maharaj,et al. Deep Nets Don't Learn via Memorization , 2017, ICLR.
[362] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[363] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[364] S. Dreyfus. The numerical solution of variational problems , 1962 .
[365] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[366] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[367] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[368] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[369] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[370] Ben Glocker,et al. Automatic Localization and Identification of Vertebrae in Arbitrary Field-of-View CT Scans , 2012, MICCAI.
[371] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[372] S. Cameron,et al. Automatic spine identification in abdominal CT slices using image partition forests , 2009, 2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis.
[373] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[374] Yoshua Bengio,et al. Deep Learning of Representations: Looking Forward , 2013, SLSP.
[375] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[376] Yoshua Bengio,et al. Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.
[377] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[378] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[379] Jürgen Schmidhuber,et al. Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.
[380] Sebastian Thrun,et al. Learning One More Thing , 1994, IJCAI.
[381] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[382] Rich Caruana,et al. Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.
[383] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.
[384] ChengXiang Zhai,et al. Instance Weighting for Domain Adaptation in NLP , 2007, ACL.
[385] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[386] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[387] Bernard Widrow,et al. MADALINE RULE II: a training algorithm for neural networks , 1988, ICNN.
[388] Jerry L Prince,et al. Current methods in medical image segmentation. , 2000, Annual review of biomedical engineering.
[389] Fernando Corinto,et al. CNN-based algorithm for drusen identification , 2006, 2006 IEEE International Symposium on Circuits and Systems.
[390] J. Knott. The organization of behavior: A neuropsychological theory , 1951 .
[391] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[392] T. Terlaky. On lp programming , 1985 .
[393] Jianmin Wang,et al. Learning Multiple Tasks with Deep Relationship Networks , 2015, ArXiv.
[394] Thomas S. Huang,et al. Interactive Facial Feature Localization , 2012, ECCV.
[395] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[396] Richard M. Schwartz,et al. An Algorithm that Learns What's in a Name , 1999, Machine Learning.
[397] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .
[398] Marc'Aurelio Ranzato,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.
[399] Tom M. Mitchell,et al. The Need for Biases in Learning Generalizations , 2007 .
[400] Rina Dechter,et al. Learning While Searching in Constraint-Satisfaction-Problems , 1986, AAAI.
[401] Narendra Ahuja,et al. Cresceptron: a self-organizing neural network which grows adaptively , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[402] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[403] Pascal Vincent,et al. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.
[404] Shih-Chii Liu,et al. Computation with Spikes in a Winner-Take-All Network , 2009, Neural Computation.