暂无分享,去创建一个
[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[2] Alexander H. Waibel,et al. Adaptively Growing Hierarchical Mixtures of Experts , 1996, NIPS.
[3] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[4] John J. Grefenstette,et al. Case-Based Anytime Learning , 1994 .
[5] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[6] Martial Hebert,et al. Growing a Brain: Fine-Tuning by Increasing Model Capacity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[8] Gregory Cohen,et al. EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[9] Steve R. Waterhouse,et al. Constructive Algorithms for Hierarchical Mixtures of Experts , 1995, NIPS.
[10] Sebastian Thrun,et al. Lifelong Learning Algorithms , 1998, Learning to Learn.
[11] Sebastian Thrun,et al. A lifelong learning perspective for mobile robot control , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).
[12] Bo Liu,et al. Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks , 2021, NeurIPS.
[13] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[14] Feng Yan,et al. AutoGrow: Automatic Layer Growing in Deep Convolutional Networks , 2019, KDD.
[15] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[16] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[17] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[18] P. Bühlmann,et al. Analyzing Bagging , 2001 .
[19] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[20] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[21] Ludovic Denoyer,et al. Efficient Continual Learning with Modular Networks and Task-Driven Priors , 2020, ArXiv.
[22] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.
[23] Sergio Escalera,et al. Towards Automated Deep Learning: Analysis of the AutoDL challenge series 2019 , 2019, Proceedings of Machine Learning Research.
[24] Sung Ju Hwang,et al. Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.
[25] Ludovic Denoyer. Deep Sequential Neural Networks , 2014 .
[26] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[27] Marc'Aurelio Ranzato,et al. Learning Factored Representations in a Deep Mixture of Experts , 2013, ICLR.
[28] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[29] John J. Grefenstette,et al. An Approach to Anytime Learning , 1992, ML.
[30] Xuan Liang,et al. On the Subbagging Estimation for Massive Data , 2021 .
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Qiang Liu,et al. Splitting Steepest Descent for Growing Neural Architectures , 2019, NeurIPS.
[33] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.
[34] BreimanLeo. Pasting Small Votes for Classification in Large Databases and On-Line , 1999 .
[35] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[36] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[37] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[38] Mark B. Ring. CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.
[39] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.