暂无分享,去创建一个
Stan Matwin | Lisa Di-Jorio | Lucas May Petry | Farshid Varno | S. Matwin | Farshid Varno | Lisa Di-Jorio
[1] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[2] David Faust,et al. Eliminating the hindsight bias. , 1988 .
[3] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[4] Mark S. Boddy,et al. Solving Time-Dependent Planning Problems , 1989, IJCAI.
[5] J. Schmidhuber,et al. A neural network that embeds its own meta-levels , 1993, IEEE International Conference on Neural Networks.
[6] Richard E. Korf,et al. A Complete Anytime Algorithm for Number Partitioning , 1998, Artif. Intell..
[7] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[8] J. Kruger,et al. Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments. , 1999, Journal of personality and social psychology.
[9] Sabina Kleitman,et al. The Role of Individual Differences in the Accuracy of Confidence Judgments , 2002, The Journal of general psychology.
[10] Anders Winman,et al. Subjective probability intervals: how to reduce overconfidence by interval evaluation. , 2004, Journal of experimental psychology. Learning, memory, and cognition.
[11] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[12] Alan C. Weinstein,et al. Anti-Defamation League of B'Nai B'rith , 2006 .
[13] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .
[14] S. Thompson. Social Learning Theory , 2008 .
[15] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[16] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[17] Fiona Fidler,et al. Reducing Overconfidence in the Interval Judgments of Experts , 2010, Risk analysis : an official publication of the Society for Risk Analysis.
[18] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[19] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[20] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[21] Yoshua Bengio,et al. Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.
[22] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[23] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[24] Rob Fergus,et al. Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.
[25] D. Runia,et al. Title of the Work , 2019, Philo of Alexandria: On the Life of Abraham.
[26] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[27] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.
[28] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[29] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[30] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[31] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.
[32] Jitendra Malik,et al. Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.
[33] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[34] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[35] Florent Meyniel,et al. The Sense of Confidence during Probabilistic Learning: A Normative Account , 2015, PLoS Comput. Biol..
[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[37] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[38] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[39] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[40] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Trevor Darrell,et al. Best Practices for Fine-Tuning Visual Classifiers to New Domains , 2016, ECCV Workshops.
[43] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[44] Xinbo Chen,et al. Evaluating the Energy Efficiency of Deep Convolutional Neural Networks on CPUs and GPUs , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).
[45] Joachim Denzler,et al. Impatient DNNs - Deep Neural Networks with Dynamic Time Budgets , 2016, BMVC.
[46] Jiri Matas,et al. All you need is a good init , 2015, ICLR.
[47] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[48] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[49] Yang Zhong,et al. Face attribute prediction using off-the-shelf CNN features , 2016, 2016 International Conference on Biometrics (ICB).
[50] Christoph H. Lampert,et al. iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[52] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[53] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Richard Socher,et al. Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.
[55] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[56] David Barber,et al. Nesterov's accelerated gradient and momentum as approximations to regularised update descent , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).
[57] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[58] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.
[59] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[60] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.
[61] Brian McWilliams,et al. The Shattered Gradients Problem: If resnets are the answer, then what is the question? , 2017, ICML.
[62] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Sashank J. Reddi,et al. On the Convergence of Adam and Beyond , 2018, ICLR.
[64] David Dunning,et al. Overconfidence Among Beginners: Is a Little Learning a Dangerous Thing? , 2018, Journal of personality and social psychology.
[65] Kilian Q. Weinberger,et al. Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.
[66] Fred A. Hamprecht,et al. Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.
[67] Yiyang Zhao,et al. AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search , 2019, ArXiv.
[68] J. Schulman,et al. Reptile: a Scalable Metalearning Algorithm , 2018 .
[69] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[70] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.
[71] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[72] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.
[73] Kaiming He,et al. Group Normalization , 2018, ECCV.
[74] Guojun Lu,et al. Transfer Learning Using Classification Layer Features of CNN , 2018 .
[75] Nikos Komodakis,et al. Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[76] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[77] Aaron Klein,et al. Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search , 2018, ArXiv.
[78] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.
[79] Ondrej Bojar,et al. Training Tips for the Transformer Model , 2018, Prague Bull. Math. Linguistics.
[80] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[81] Jianfeng Zhan,et al. Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks , 2017, ICANN.
[82] Matthew A. Brown,et al. Low-Shot Learning with Imprinted Weights , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[83] Yen-Cheng Liu,et al. Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines , 2018, ArXiv.
[84] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[85] Lu Lu,et al. Dying ReLU and Initialization: Theory and Numerical Examples , 2019, Communications in Computational Physics.
[86] Jindong Wang,et al. Easy Transfer Learning By Exploiting Intra-Domain Structures , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).
[87] Andreas S. Tolias,et al. Three scenarios for continual learning , 2019, ArXiv.
[88] Stan Matwin,et al. Efficient Neural Task Adaptation by Maximum Entropy Initialization , 2019, ArXiv.
[89] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[90] M. Burtsev,et al. Loss Surface Sightseeing by Multi-Point Optimization , 2019 .
[91] Kaiming He,et al. Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[92] Debadeepta Dey,et al. Learning Anytime Predictions in Neural Networks via Adaptive Loss Balancing , 2017, AAAI.
[93] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[94] Yannis Avrithis,et al. Dense Classification and Implanting for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[95] Yoshua Bengio,et al. The Benefits of Over-parameterization at Initialization in Deep ReLU Networks , 2019, ArXiv.
[96] M. Maire,et al. ALERT: Accurate Anytime Learning for Energy and Timeliness , 2019, ArXiv.
[97] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[98] Radu Calinescu,et al. Assuring the Machine Learning Lifecycle , 2019, ACM Comput. Surv..
[99] Tengyu Ma,et al. Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.
[100] Sashank J. Reddi,et al. Why ADAM Beats SGD for Attention Models , 2019, ArXiv.
[101] Liang Zhao,et al. Interpreting and Evaluating Neural Network Robustness , 2019, IJCAI.
[102] Jon Kleinberg,et al. Transfusion: Understanding Transfer Learning for Medical Imaging , 2019, NeurIPS.
[103] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[104] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[105] Wojciech M. Czarnecki,et al. A Deep Neural Network's Loss Surface Contains Every Low-dimensional Pattern , 2019, ArXiv.
[106] Yao Zhang,et al. Geometry of energy landscapes and the optimizability of deep neural networks , 2018, Physical review letters.
[107] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[108] Willem Zuidema,et al. Transferring Inductive Biases through Knowledge Distillation , 2020, ArXiv.
[109] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[110] Mohammad Havaei,et al. Continuous Domain Adaptation with Variational Domain-Agnostic Feature Replay , 2020, ArXiv.
[111] Michael Maire,et al. ALERT: Accurate Learning for Energy and Timeliness , 2019, USENIX Annual Technical Conference.
[112] Sashank J. Reddi,et al. Why are Adaptive Methods Good for Attention Models? , 2020, NeurIPS.
[113] R. Thomas McCoy,et al. Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks , 2020, TACL.
[114] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[115] Stefano Soatto,et al. A Baseline for Few-Shot Image Classification , 2019, ICLR.