Scalable greedy algorithms for transfer learning

We study the binary transfer learning leveraging on auxiliary source classifiers.We propose two efficient algorithms which select relevant sources from a large pool.One of the algorithms has computational cost independent from the number of sources.Our algorithms achieve state-of-the-art results on three computer vision datasets.We theoretically prove that our algorithm can learn effectively from few examples. In this paper we consider the binary transfer learning problem, focusing on how to select and combine sources from a large pool to yield a good performance on a target task. Constraining our scenario to real world, we do not assume the direct access to the source data, but rather we employ the source hypotheses trained from them. We propose an efficient algorithm that selects relevant source hypotheses and feature dimensions simultaneously, building on the literature on the best subset selection problem. Our algorithm achieves state-of-the-art results on three computer vision datasets, substantially outperforming both transfer learning and popular feature selection baselines in a small-sample setting. We also present a randomized variant that achieves the same results with the computational cost independent from the number of source hypotheses and feature dimensions. Also, we theoretically prove that, under reasonable assumptions on the source hypotheses, our algorithm can learn effectively from few examples.

[1]  Ilja Kuzborskij,et al.  When Naïve Bayes Nearest Neighbors Meet Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[4]  Shai Ben-David Domain Adaptation as Learning with Auxiliary Information , 2013 .

[5]  Rama Chellappa,et al.  Domain Adaptive Dictionary Learning , 2012, ECCV.

[6]  Ilja Kuzborskij,et al.  From N to N+1: Multiclass Transfer Incremental Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Rama Chellappa,et al.  Visual Domain Adaptation: A survey of recent advances , 2015, IEEE Signal Processing Magazine.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[10]  Joshua B. Tenenbaum,et al.  Learning to share visual appearance for multiclass object detection , 2011, CVPR 2011.

[11]  Barbara Caputo,et al.  Frustratingly Easy NBNN Domain Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[13]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[14]  Ilja Kuzborskij,et al.  Transfer Learning Through Greedy Subset Selection , 2014, ICIAP.

[15]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[16]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[17]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[18]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[19]  Dong Xu,et al.  Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[21]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[22]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[23]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[25]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[26]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[27]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Tong Zhang,et al.  On the Consistency of Feature Selection using Greedy Least Squares Regression , 2009, J. Mach. Learn. Res..

[29]  Ivor W. Tsang,et al.  Healing Sample Selection Bias by Source Classifier Selection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[30]  Lorenzo Torresani,et al.  Classemes and Other Classifier-Based Features for Efficient Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[32]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[33]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[34]  Ivor W. Tsang,et al.  Domain adaptation from multiple sources via auxiliary classifiers , 2009, ICML '09.

[35]  Ilja Kuzborskij,et al.  Stability and Hypothesis Transfer Learning , 2013, ICML.

[36]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[37]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[38]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[39]  Barbara Caputo,et al.  Learning Categories From Few Examples With Multi Model Knowledge Transfer , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Abhimanyu Das,et al.  Algorithms for subset selection in linear regression , 2008, STOC.

[41]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[42]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[43]  Antonio Torralba,et al.  Transfer Learning by Borrowing Examples for Multiclass Object Detection , 2011, NIPS.

[44]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Ilja Kuzborskij,et al.  When Naı̈ve Bayes Nearest Neighbors Meet Convolutional Neural Networks , 2015 .

[46]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[47]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[48]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[49]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[50]  Barbara Caputo,et al.  Learning to Learn, from Transfer Learning to Domain Adaptation: A Unifying Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Vittorio Ferrari,et al.  Associative Embeddings for Large-Scale Knowledge Transfer with Self-Assessment , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[53]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[54]  Barbara Caputo,et al.  Multiclass transfer learning from unconstrained priors , 2011, 2011 International Conference on Computer Vision.