Transfer Learning for Visual Categorization: A Survey

Regular machine learning and data mining techniques study the training data for future inferences under a major assumption that the future data are within the same feature space or have the same distribution as the training data. However, due to the limited availability of human labeled training data, training data that stay in the same feature space or have the same distribution as the future data cannot be guaranteed to be sufficient enough to avoid the over-fitting problem. In real-world applications, apart from data in the target domain, related data in a different domain can also be included to expand the availability of our prior knowledge about the target future data. Transfer learning addresses such cross-domain learning problems by extracting useful information from data in a related domain and transferring them for being used in target tasks. In recent years, with transfer learning being applied to visual categorization, some typical problems, e.g., view divergence in action recognition tasks and concept drifting in image classification tasks, can be efficiently solved. In this paper, we survey state-of-the-art transfer learning algorithms in visual categorization applications, such as object recognition, image classification, and human action recognition.

[1]  Yi Yao,et al.  Boosting for transfer learning with multiple sources , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Emilio Soria Olivas,et al.  Handbook of Research on Machine Learning Applications and Trends : Algorithms , Methods , and Techniques , 2009 .

[3]  Zhaohong Deng,et al.  Knowledge-leverage-based TSK Fuzzy System modeling. , 2013, IEEE transactions on neural networks and learning systems.

[4]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[5]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[6]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Ling Shao,et al.  Nonlocal Hierarchical Dictionary Learning Using Wavelets for Image Denoising , 2013, IEEE Transactions on Image Processing.

[8]  Li Fei-Fei Knowledge transfer in learning to recognize visual objects classes , 2006 .

[9]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[10]  Ling Shao,et al.  Efficient dictionary learning for visual categorization , 2014, Comput. Vis. Image Underst..

[11]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[12]  Charu C. Aggarwal,et al.  Towards cross-category knowledge propagation for learning visual concepts , 2011, CVPR 2011.

[13]  Gavin C. Cawley,et al.  Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[14]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[15]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[16]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[17]  Ling Shao,et al.  Efficient Search and Localization of Human Actions in Video Databases , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[19]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[20]  Ali Farhadi,et al.  A latent model of discriminative aspect , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[22]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[23]  Yaser Sheikh,et al.  On the use of anthropometry in the invariant analysis of human actions , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[24]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[25]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[26]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[27]  Barbara Caputo,et al.  Safety in numbers: Learning categories from few examples with multi model knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Lorenzo Torresani,et al.  Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach , 2010, NIPS.

[29]  Ling Shao,et al.  Spatio-Temporal Laplacian Pyramid Coding for Action Recognition , 2014, IEEE Transactions on Cybernetics.

[30]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[32]  Qiang Yang,et al.  Cross-domain activity recognition , 2009, UbiComp.

[33]  Ling Shao,et al.  Multi-view action recognition using local similarity random forests and sensor fusion , 2013, Pattern Recognit. Lett..

[34]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[35]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Ling Shao,et al.  Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition , 2013, Pattern Recognit..

[37]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[38]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Dong Xu,et al.  Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[41]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[42]  Tao Li,et al.  The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[43]  Dong Liu,et al.  Robust visual domain adaptation with low-rank reconstruction , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[45]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[48]  Zhaohong Deng,et al.  Knowledge-Leverage-Based Fuzzy System and Its Modeling , 2013, IEEE Transactions on Fuzzy Systems.

[49]  Ling Shao,et al.  Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition , 2014, International Journal of Computer Vision.

[50]  Ling Shao,et al.  One shot learning gesture recognition from RGBD images , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[51]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[52]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Ruonan Li,et al.  Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Zhaohong Deng,et al.  Knowledge-Leverage-Based TSK Fuzzy System Modeling , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[55]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[56]  Ivor W. Tsang,et al.  Domain Transfer SVM for video concept detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[58]  Trevor Darrell,et al.  Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  M. Alex O. Vasilescu,et al.  Recognizing action events from multiple viewpoints , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[60]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[61]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[62]  Xuelong Li,et al.  Transfer latent variable model based on divergence analysis , 2011, Pattern Recognit..

[63]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[64]  Feiping Nie,et al.  Dyadic transfer learning for cross-domain image classification , 2011, 2011 International Conference on Computer Vision.

[65]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[66]  Steven M. Seitz,et al.  View-Invariant Analysis of Cyclic Motion , 1997, International Journal of Computer Vision.

[67]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[68]  Alex Pentland,et al.  Task-Specific Gesture Analysis in Real-Time Using Interpolated Views , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  Barbara Caputo,et al.  Multiclass transfer learning from unconstrained priors , 2011, 2011 International Conference on Computer Vision.

[70]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[72]  Ling Shao,et al.  Learning Deep and Wide: A Spectral Method for Learning Deep Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[73]  Jieping Ye,et al.  SVM versus Least Squares SVM , 2007, AISTATS.

[74]  Ling Shao,et al.  Learning Object-to-Class Kernels for Scene Classification , 2014, IEEE Transactions on Image Processing.

[75]  Xiaodong Yu,et al.  Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example , 2010, ECCV.

[76]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  Xuelong Li,et al.  Transfer learning for pedestrian detection , 2013, Neurocomputing.

[78]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Rama Chellappa,et al.  Cross-View Action Recognition via a Transferable Dictionary Pair , 2012, BMVC.

[80]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[81]  Carlos Orrite-Uruñuela,et al.  One-Sequence Learning of Human Actions , 2011, HBU.

[82]  Ling Shao,et al.  Feature Learning for Image Classification Via Multiobjective Genetic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[83]  Yang Wang,et al.  Max-margin hidden conditional random fields for human action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  Ling Shao,et al.  Correspondence-Free Dictionary Learning for Cross-View Action Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[85]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[86]  Luc Van Gool,et al.  A Hough transform-based voting framework for action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[87]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[88]  L. Shao,et al.  From Heuristic Optimization to Dictionary Learning: A Review and Comprehensive Comparison of Image Denoising Algorithms , 2014, IEEE Transactions on Cybernetics.

[89]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[91]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[92]  Shuicheng Yan,et al.  SIFT-Bag kernel for video event analysis , 2008, ACM Multimedia.

[93]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[94]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[95]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[96]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[97]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.