Image categorization combining neighborhood methods and boosting

We describe an efficient and scalable system for automatic image categorization. Our approach seeks to marry scalable "model-free" neighborhood-based annotation with accurate boosting-based per-tag modeling. For accelerated neighborhood-based classification, we use a set of spatial data structures as weak classifiers for an arbitrary number of categories. We employ standard edge and color features and an approximation scheme that scales to large training sets. The weak classifier outputs are combined in a tag-dependent fashion via boosting to improve accuracy. The method performs competitively with standard SVM-based per-tag classification with substantially reduced computational requirements. We present multi-label image annotation experiments using data sets of more than two million photos.

[1]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[2]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.

[3]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[4]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[6]  Rong Yan,et al.  How many high-level concepts will fill the semantic gap in news video retrieval? , 2007, CIVR '07.

[7]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[8]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Cor J. Veenman,et al.  Robust Scene Categorization by Learning Image Statistics in Context , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[10]  Wei-Ying Ma,et al.  Image annotation by large-scale content-based image retrieval , 2006, MM '06.

[11]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  John Adcock,et al.  Experiments in interactive video search by addition and subtraction , 2008, CIVR '08.

[13]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[14]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[15]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[16]  Dong Wang,et al.  Video search in concept subspace: a text-like paradigm , 2007, CIVR '07.

[17]  Rong Yan,et al.  Model-shared subspace boosting for multi-label classification , 2007, KDD '07.

[18]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[19]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[20]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[23]  Rong Yan,et al.  Mining Relationship Between Video Concepts using Probabilistic Graphical Models , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[24]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[25]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[26]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[27]  Marcel Worring,et al.  Annotating images by harnessing worldwide user-tagged photos , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[31]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[32]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[33]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[34]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[35]  George Kollios,et al.  BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Changhu Wang,et al.  Scalable search-based image annotation , 2008, Multimedia Systems.

[37]  Dmitriy Fradkin,et al.  Experiments with random projections for machine learning , 2003, KDD '03.

[38]  Divyakant Agrawal,et al.  Approximate nearest neighbor searching in multimedia databases , 2001, Proceedings 17th International Conference on Data Engineering.