Visual Recognition by Learning From Web Data via Weakly Supervised Domain Generalization

In this paper, a weakly supervised domain generalization (WSDG) method is proposed for real-world visual recognition tasks, in which we train classifiers by using Web data (e.g., Web images and Web videos) with noisy labels. In particular, two challenging problems need to be solved when learning robust classifiers, in which the first issue is to cope with the label noise of training Web data from the source domain, while the second issue is to enhance the generalization capability of learned classifiers to an arbitrary target domain. In order to handle the first problem, the training samples within each category are partitioned into clusters, where we use one bag to denote each cluster and instances to denote the samples in each cluster. Then, we identify a proportion of good training samples in each bag and train robust classifiers by using the good training samples, which leads to a multi-instance learning (MIL) problem. In order to handle the second problem, we assume that the training samples possibly form a set of hidden domains, with each hidden domain associated with a distinctive data distribution. Then, for each category and each hidden latent domain, we propose to learn one classifier by extending our MIL formulation, which leads to our WSDG approach. In the testing stage, our approach can obtain better generalization capability by effectively integrating multiple classifiers from different latent domains in each category. Moreover, our WSDG approach is further extended to utilize additional textual descriptions associated with Web data as privileged information (PI), although testing data do not have such PI. Extensive experiments on three benchmark data sets indicate that our newly proposed methods are effective for real-world visual recognition tasks by learning from Web data.

[1]  Ivor W. Tsang,et al.  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Domain Adaptation from Multiple Sources: A Domain- , 2022 .

[2]  Ivor W. Tsang,et al.  Text-based image retrieval using progressive multi-instance learning , 2011, 2011 International Conference on Computer Vision.

[3]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[4]  Jiebo Luo,et al.  Kodak consumer video benchmark data set : concept definition and annotation * * , 2008 .

[5]  Rama Chellappa,et al.  Visual Domain Adaptation: A survey of recent advances , 2015, IEEE Signal Processing Magazine.

[6]  Dong Liu,et al.  Robust visual domain adaptation with low-rank reconstruction , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Dong Xu,et al.  Visual recognition by learning from web data: A weakly supervised domain generalization approach , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Peter Tiño,et al.  Incorporating Privileged Information Through Metric Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[12]  Pierre-Antoine Absil,et al.  Newton-KKT interior-point methods for indefinite quadratic programming , 2007, Comput. Optim. Appl..

[13]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[14]  Dong Xu,et al.  Event Recognition in Videos by Learning from Heterogeneous Web Sources , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[16]  Ming Shao,et al.  Deep Low-Rank Coding for Transfer Learning , 2015, IJCAI.

[17]  Razvan C. Bunescu,et al.  Multiple instance learning for sparse positive bags , 2007, ICML '07.

[18]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Li Cheng,et al.  Semi-supervised Domain Adaptation on Manifolds , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[21]  Sebastian Nowozin,et al.  Infinite Kernel Learning , 2008, NIPS 2008.

[22]  Dong Xu,et al.  Distance Metric Learning Using Privileged Information for Face Verification and Person Re-Identification , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Lorenzo Bruzzone,et al.  Domain Adaptation Problems: A DASVM Classification Technique and a Circular Validation Strategy , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Dong Xu,et al.  Exploiting Privileged Information from Web Data for Image Categorization , 2014, ECCV.

[25]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[26]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[27]  Jason J. Corso,et al.  Latent Domains Modeling for Visual Domain Adaptation , 2014, AAAI.

[28]  Lorenzo Torresani,et al.  Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach , 2010, NIPS.

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[31]  Kristen Grauman,et al.  Reshaping Visual Datasets for Domain Adaptation , 2013, NIPS.

[32]  Ming Shao,et al.  Latent Low-Rank Transfer Subspace Learning for Missing Modality Recognition , 2014, AAAI.

[33]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Andrew Zisserman,et al.  Discriminative Sub-categorization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Zhuowen Tu,et al.  Max-Margin Multiple-Instance Dictionary Learning , 2013, ICML.

[36]  Dong Xu,et al.  Exploiting Low-Rank Structure from Latent Domains for Domain Generalization , 2014, ECCV.

[37]  Ivor W. Tsang,et al.  A Convex Method for Locating Regions of Interest with Multi-instance Learning , 2009, ECML/PKDD.

[38]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[39]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Alexei A. Efros,et al.  Undoing the Damage of Dataset Bias , 2012, ECCV.

[41]  Trevor Darrell,et al.  Discovering Latent Domains for Multisource Domain Adaptation , 2012, ECCV.

[42]  Christoph H. Lampert,et al.  Learning to Rank Using Privileged Information , 2013, 2013 IEEE International Conference on Computer Vision.

[43]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[44]  P. Bartlett,et al.  ` p-Norm Multiple Kernel Learning , 2008 .

[45]  Dong Xu,et al.  Multi-view Domain Generalization for Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[47]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[48]  Luo Si,et al.  M3IC: Maximum Margin Multiple Instance Clustering , 2009, IJCAI.

[49]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[50]  Dong Xu,et al.  Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Ming Shao,et al.  Generalized Transfer Subspace Learning Through Low-Rank Constraint , 2014, International Journal of Computer Vision.

[52]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[53]  Zhi-Hua Zhou,et al.  M3MIML: A Maximum Margin Method for Multi-instance Multi-label Learning , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[54]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[55]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[56]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[57]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .