Extracting Privileged Information from Untagged Corpora for Classifier Learning

The performance of data-driven learning approaches is often unsatisfactory when the training data is inadequate either in quantity or quality. Manually labeled privileged information (PI), e.g., attributes, tags or properties, is usually incorporated to improve classifier learning. However, the process of manually labeling is time-consuming and labor-intensive. To address this issue, we propose to enhance classifier learning by extracting PI from untagged corpora, which can effectively eliminate the dependency on manually labeled data. In detail, we treat each selected PI as a subcategory and learn one classifier for per subcategory independently. The classifiers for all subcategories are then integrated together to form a more powerful category classifier. Particularly, we propose a new instance-level multi-instance learning (MIL) model to simultaneously select a subset of training images from each subcategory and learn the optimal classifiers based on the selected images. Extensive experiments demonstrate the superiority of our approach.

[1]  Slav Petrov,et al.  Syntactic Annotations for the Google Books NGram Corpus , 2012, ACL.

[2]  Zhuowen Tu,et al.  Max-Margin Multiple-Instance Dictionary Learning , 2013, ICML.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[5]  Jian Zhang,et al.  Discovering and Distinguishing Multiple Visual Senses for Polysemous Words , 2018, AAAI.

[6]  Kang-Mo Jung,et al.  Multiclass Support Vector Machines with SCAD , 2012 .

[7]  Mikhail Belkin,et al.  A Co-Regularization Approach to Semi-supervised Learning with Multiple Views , 2005 .

[8]  Thomas Gärtner,et al.  Efficient co-regularised least squares regression , 2006, ICML.

[9]  Wei Liu,et al.  Classification by Retrieval: Binarizing Data and Classifiers , 2017, SIGIR.

[10]  Matthieu Guillaumin,et al.  From categories to subcategories: Large-scale image classification with partial class label refinement , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Zhang,et al.  Exploiting Web Images for Dataset Construction: A Domain Robust Approach , 2016, IEEE Transactions on Multimedia.

[12]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Lei Wang,et al.  An Efficient Approach to Integrating Radius Information into Multiple Kernel Learning , 2013, IEEE Transactions on Cybernetics.

[14]  Ivor W. Tsang,et al.  Text-based image retrieval using progressive multi-instance learning , 2011, 2011 International Conference on Computer Vision.

[15]  Jian Zhang,et al.  A Domain Robust Approach For Image Dataset Construction , 2016, ACM Multimedia.

[16]  Andrew Zisserman,et al.  Discriminative Sub-categorization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Lei Wang,et al.  Multiple kernel extreme learning machine , 2015, Neurocomputing.

[18]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[19]  Dong Xu,et al.  Exploiting Privileged Information from Web Data for Image Categorization , 2014, ECCV.

[20]  Jianfei Cai,et al.  Visual Recognition by Learning From Web Data via Weakly Supervised Domain Generalization , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[22]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Qiang Ji,et al.  Classifier learning with hidden information , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jian Zhang,et al.  Automatic image dataset construction with multiple textual metadata , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[25]  Yang Yang,et al.  Deep Asymmetric Pairwise Hashing , 2017, ACM Multimedia.

[26]  Razvan C. Bunescu,et al.  Multiple instance learning for sparse positive bags , 2007, ICML '07.

[27]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.