Image Annotation by Multiple-Instance Learning With Discriminative Feature Mapping and Selection

Multiple-instance learning (MIL) has been widely investigated in image annotation for its capability of exploring region-level visual information of images. Recent studies show that, by performing feature mapping, MIL can be cast to a single-instance learning problem and, thus, can be solved by traditional supervised learning methods. However, the approaches for feature mapping usually overlook the discriminative ability and the noises of the generated features. In this paper, we propose an MIL method with discriminative feature mapping and feature selection, aiming at solving this problem. Our method is able to explore both the positive and negative concept correlations. It can also select the effective features from a large and diverse set of low-level features for each concept under MIL settings. Experimental results and comparison with other methods demonstrate the effectiveness of our approach.

[1]  R. Manmatha,et al.  An Inference Network Approach to Image Retrieval , 2004, CIVR.

[2]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[3]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[4]  Thomas S. Huang,et al.  Relevance feedback in content-based image retrieval: some recent advances , 2002, Inf. Sci..

[5]  Yixin Chen,et al.  A sparse support vector machine approach to region-based image categorization , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[8]  Xin Xu,et al.  Logistic Regression and Boosting for Labeled Bags of Instances , 2004, PAKDD.

[9]  Xuelong Li,et al.  A new approach for face recognition by sketches in photos , 2009, Signal Process..

[10]  Jianping Fan,et al.  Incorporating concept ontology to enable probabilistic concept reasoning for multi-level image annotation , 2006, MIR '06.

[11]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[12]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[13]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[14]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[15]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[16]  Xuelong Li,et al.  Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search , 2013, IEEE Transactions on Image Processing.

[17]  Tat-Seng Chua,et al.  Image Annotation by Graph-Based Inference With Integrated Multiple/Single Instance Representations , 2010, IEEE Transactions on Multimedia.

[18]  Yixin Chen,et al.  Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..

[19]  Yong Wang,et al.  Coherent image annotation by learning semantic distance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Rainer Lienhart,et al.  Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection , 2003, DAGM-Symposium.

[21]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[22]  Meng Wang,et al.  Video annotation by graph-based learning with neighborhood similarity , 2007, ACM Multimedia.

[23]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[24]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[26]  Murat Dundar,et al.  Bayesian multiple instance learning: automatic feature selection and inductive transfer , 2008, ICML '08.

[27]  Zhi-Hua Zhou,et al.  Solving multi-instance problems with classifier ensemble based on constructive clustering , 2007, Knowledge and Information Systems.

[28]  N. Deng,et al.  Feature Selection in Multi-instance Learning ∗ , 2010 .

[29]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Mykola Pechenizkiy,et al.  HyDR-MI: A hybrid algorithm to reduce dimensionality in multiple instance learning , 2013, Inf. Sci..

[31]  Jonathon S. Hare,et al.  Mind the gap: another look at the problem of the semantic gap in image retrieval , 2006, Electronic Imaging.

[32]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[33]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[34]  Yong Wang,et al.  Combining global, regional and contextual features for automatic image annotation , 2009, Pattern Recognit..

[35]  Wu-Jun Li,et al.  MILD: Multiple-Instance Learning via Disambiguation , 2010, IEEE Transactions on Knowledge and Data Engineering.

[36]  Meng Wang,et al.  A Novel Multiple Instance Learning Approach for Image Retrieval Based on Adaboost Feature Selection , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[37]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Yi Yang,et al.  Web and Personal Image Annotation by Mining Label Correlation With Relaxed Visual Graph Embedding , 2012, IEEE Transactions on Image Processing.

[39]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[40]  Meng Wang,et al.  Video semantic analysis based on structure-sensitive anisotropic manifold ranking , 2009, Signal Process..

[41]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[42]  Meng Wang,et al.  Concept-dependent image annotation via existence-based multiple-instance learning , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[43]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Yi Li,et al.  ARISTA - image search to annotation on billions of web photos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[46]  Tao Mei,et al.  Concurrent Multiple Instance Learning for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Zhi-Hua Zhou,et al.  Improve Multi-Instance Neural Networks through Feature Selection , 2004, Neural Processing Letters.

[48]  Bernhard Pfahringer,et al.  A Two-Level Learning Method for Generalized Multi-instance Problems , 2003, ECML.

[49]  Woo-Cheol Kim,et al.  Image retrieval model based on weighted visual features determined by relevance feedback , 2008, Inf. Sci..

[50]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.