PPP: Joint Pointwise and Pairwise Image Label Prediction

Pointwise label and pairwise label are both widely used in computer vision tasks. For example, supervised image classification and annotation approaches use pointwise label, while attribute-based image relative learning often adopts pairwise labels. These two types of labels are often considered independently and most existing efforts utilize them separately. However, pointwise labels in image classification and tag annotation are inherently related to the pairwise labels. For example, an image labeled with "coast" and annotated with "beach, sea, sand, sky" is more likely to have a higher ranking score in terms of the attribute "open", while "men shoes" ranked highly on the attribute "formal" are likely to be annotated with "leather, lace up" than "buckle, fabric". The existence of potential relations between pointwise labels and pairwise labels motivates us to fuse them together for jointly addressing related vision tasks. In particular, we provide a principled way to capture the relations between class labels, tags and attributes, and propose a novel framework PPP(Pointwise and Pairwise image label Prediction), which is based on overlapped group structure extracted from the pointwise-pairwise-label bipartite graph. With experiments on benchmark datasets, we demonstrate that the proposed framework achieves superior performance on three vision tasks compared to the state-of-the-art methods.

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Alexander C. Berg,et al.  Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[4]  Kilian Q. Weinberger,et al.  Fast Image Tagging , 2013, ICML.

[5]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[6]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[7]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[8]  Jieping Ye,et al.  Extracting shared subspace for multi-label classification , 2008, KDD.

[9]  Baoxin Li,et al.  Fusing Pointwise and Pairwise Labels for Supporting User-adaptive Image Retrieval , 2015, ICMR.

[10]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[11]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Baoxin Li,et al.  Predicting Multiple Attributes via Relative Multi-task Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Baoxin Li,et al.  Unsupervised Sentiment Analysis for Social Media Images , 2015, IJCAI.

[14]  Noah A. Smith,et al.  Making the Most of Bag of Words: Sentence Regularization with Alternating Direction Method of Multipliers , 2014, ICML.

[15]  Gerhard Friedrich,et al.  Recommender Systems - An Introduction , 2010 .

[16]  Adriana Kovashka,et al.  Attribute Adaptation for Personalized Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Adriana Kovashka,et al.  Attribute Pivots for Guiding Relevance Feedback in Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[20]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Luciano Sbaiz,et al.  Finding meaning on YouTube: Tag recommendation and category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Jieping Ye,et al.  Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Baoxin Li,et al.  Inferring Sentiment from Web Images with Joint Inference on Visual and Social Cues: A Regulated Matrix Factorization Approach , 2021, ICWSM.

[25]  D. Sculley,et al.  Combined regression and ranking , 2010, KDD.

[26]  Huan Liu,et al.  Exploring Implicit Hierarchical Structures for Recommender Systems , 2015, IJCAI.

[27]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[28]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.