Event Recognition in Videos by Learning from Heterogeneous Web Sources

In this work, we propose to leverage a large number of loosely labeled web videos (e.g., from YouTube) and web images (e.g., from Google/Bing image search) for visual event recognition in consumer videos without requiring any labeled consumer videos. We formulate this task as a new multi-domain adaptation problem with heterogeneous sources, in which the samples from different source domains can be represented by different types of features with different dimensions (e.g., the SIFT features from web images and space-time (ST) features from web videos) while the target domain samples have all types of features. To effectively cope with the heterogeneous sources where some source domains are more relevant to the target domain, we propose a new method called Multi-domain Adaptation with Heterogeneous Sources (MDA-HS) to learn an optimal target classifier, in which we simultaneously seek the optimal weights for different source domains with different types of features as well as infer the labels of unlabeled target domain data based on multiple types of features. We solve our optimization problem by using the cutting-plane algorithm based on group based multiple kernel learning. Comprehensive experiments on two datasets demonstrate the effectiveness of MDA-HS for event recognition in consumer videos.

[1]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[2]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[4]  Jiebo Luo,et al.  Large-scale multimodal semantic concept detection for consumer video , 2007, MIR '07.

[5]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[6]  Yves Grandvalet,et al.  Composite kernel learning , 2008, ICML '08.

[7]  Gunnar Rätsch,et al.  An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis , 2008, NIPS.

[8]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[10]  David Elliott,et al.  In the Wild , 2010 .

[11]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[12]  Lorenzo Bruzzone,et al.  Domain Adaptation Problems: A DASVM Classification Technique and a Circular Validation Strategy , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Nazli Ikizler-Cinbis,et al.  Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.

[15]  John Blitzer,et al.  Co-Training for Domain Adaptation , 2011, NIPS.

[16]  Dan Zhang,et al.  Multi-view transfer learning with a large margin approach , 2011, KDD.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[19]  Ivor W. Tsang,et al.  Text-based image retrieval using progressive multi-instance learning , 2011, 2011 International Conference on Computer Vision.

[20]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[21]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[22]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[23]  Sethuraman Panchanathan,et al.  Multi-source domain adaptation and its application to early detection of fatigue , 2011, KDD.

[24]  Ivor W. Tsang,et al.  Healing Sample Selection Bias by Source Classifier Selection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[25]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[26]  Jieping Ye,et al.  Multisource domain adaptation and its application to early detection of fatigue , 2012, TKDD.

[27]  Ivor W. Tsang,et al.  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Domain Adaptation from Multiple Sources: A Domain- , 2022 .

[28]  Trevor Darrell,et al.  Discovering Latent Domains for Multisource Domain Adaptation , 2012, ECCV.

[29]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Dong Xu,et al.  Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Ivor W. Tsang,et al.  Learning with Augmented Features for Heterogeneous Domain Adaptation , 2012, ICML.

[33]  S. Sclaroff,et al.  Web-Based Classifiers for Human Action Recognition , 2012, IEEE Transactions on Multimedia.

[34]  Ivor W. Tsang,et al.  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Soft Margin Multiple Kernel Learning , 2022 .

[35]  Kristen Grauman,et al.  Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation , 2013, ICML.