论文信息 - Growing a Brain: Fine-Tuning by Increasing Model Capacity

Growing a Brain: Fine-Tuning by Increasing Model Capacity

CNNs have made an undeniable impact on computer vision through the ability to learn high-capacity models with large annotated training sets. One of their remarkable properties is the ability to transfer knowledge from a large source dataset to a (typically smaller) target dataset. This is usually accomplished through fine-tuning a fixed-size network on new target data. Indeed, virtually every contemporary visual recognition system makes use of fine-tuning to transfer knowledge from ImageNet. In this work, we analyze what components and parameters change during fine-tuning, and discover that increasing model capacity allows for more natural model adaptation through fine-tuning. By making an analogy to developmental learning, we demonstrate that growing a CNN with additional units, either by widening existing layers or deepening the overall network, significantly outperforms classic fine-tuning approaches. But in order to properly grow a network, we show that newly-added units must be appropriately normalized to allow for a pace of learning that is consistent with existing units. We empirically validate our approach on several benchmark datasets, producing state-of-the-art results.

[1] Barry J. Wadsworth. Piaget's theory of cognitive development , 1971 .

[2] R. Kitchener. Piaget’s Theory of Cognitive Development , 1986 .

[3] Sebastian Thrun,et al. Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[4] Sebastian Thrun,et al. Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge , 1998, Learning to Learn.

[5] Sebastian Thrun,et al. Lifelong Learning Algorithms , 1998, Learning to Learn.

[6] C. Nelson,et al. Handbook of Developmental Cognitive Neuroscience , 2001 .

[7] Tong Zhang,et al. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[8] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[9] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[10] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[11] Antonio Torralba,et al. Recognizing indoor scenes , 2009, CVPR.

[12] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[13] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[14] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[16] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[17] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[19] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[20] Barbara Caputo,et al. Learning Categories From Few Examples With Multi Model Knowledge Transfer , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[22] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[23] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Krista A. Ehinger,et al. SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.

[25] Jitendra Malik,et al. Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[26] Martial Hebert,et al. Model recommendation: Generating object detectors from few samples , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Wei Liu,et al. ParseNet: Looking Wider to See Better , 2015, ArXiv.

[28] Jitendra Malik,et al. Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] In-So Kweon,et al. Multi-scale pyramid pooling for deep convolutional representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30] Songfan Yang,et al. Multi-scale Recognition with DAG-CNNs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[32] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[33] Xinlei Chen,et al. Never-Ending Learning , 2012, ECAI.

[34] Atsuto Maki,et al. From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35] Rong Jin,et al. Fine-grained visual categorization via multi-stage metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[37] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[38] Alexander V. Terekhov,et al. Knowledge Transfer in Deep Block-Modular Neural Networks , 2015, Living Machines.

[39] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[40] Trevor Darrell,et al. Simultaneous Deep Transfer Across Domains and Tasks , 2015, ICCV.

[41] Olivier Sigaud,et al. Towards Deep Developmental Learning , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[42] Trevor Darrell,et al. Best Practices for Fine-Tuning Visual Classifiers to New Domains , 2016, ECCV Workshops.

[43] Luca Bertinetto,et al. Learning feed-forward one-shot learners , 2016, NIPS.

[44] Bharath Hariharan,et al. Low-shot visual object recognition , 2016, ArXiv.

[45] Martial Hebert,et al. Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[46] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[47] Jitendra Malik,et al. Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Alexei A. Efros,et al. What makes ImageNet good for transfer learning? , 2016, ArXiv.

[49] Qi Tian,et al. Good Practice in CNN Feature Transfer , 2016, ArXiv.

[50] Martial Hebert,et al. Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs , 2016, NIPS.

[51] Allan Jabri,et al. Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.

[52] Chris Tar,et al. A Growing Long-term Episodic & Semantic Memory , 2016, ArXiv.

[53] Laurent Itti,et al. Active Long Term Memory Networks , 2016, ArXiv.

[54] Daan Wierstra,et al. One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[55] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[56] Atsuto Maki,et al. Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] Martial Hebert,et al. Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[59] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[60] Bharath Hariharan,et al. Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[61] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.