Object image retrieval by exploiting online knowledge resources

We describe a method to retrieve images found on Web pages with specified object class labels, using an analysis of text around the image and of image appearance. Our method determines whether an object is both described in text and appears in a image using a discriminative image model and a generative text model. Our models are learnt by exploiting established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for image). These resources provide rich text and object appearance information. We describe results on two data sets. The first is Bergpsilas collection of ten animal categories; on this data set, we outperform previous approaches (Berg et al., 2006; Schroff et al., 2007). We have also collected five more categories. Experimental results show the effectiveness of our approach on this new data set.

[1]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[2]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  James Ze Wang,et al.  Content-based image retrieval: approaches and trends of the new age , 2005, MIR '05.

[5]  Peter G. B. Enser,et al.  Visual image retrieval: seeking the alliance of concept-based and content-based paradigms , 2000, J. Inf. Sci..

[6]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[11]  Peter G. B. Enser,et al.  Analysis of user need in image archives , 1997, J. Inf. Sci..

[12]  Jitendra Malik,et al.  Color- and texture-based image segmentation using EM and its application to content-based image retrieval , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[13]  R. Manmatha,et al.  Automatic Image Annotation and Retrieval using CrossMedia Relevance Models , 2003 .

[14]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[15]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[16]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[17]  David A. Forsyth,et al.  Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[19]  Yixin Chen,et al.  Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..

[20]  Melita Hajdinjak,et al.  The PARADISE Evaluation Framework: Issues and Findings , 2006, Computational Linguistics.

[21]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[22]  Trevor Darrell,et al.  Learning Visual Representations using Images with Captions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[24]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[25]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[26]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[27]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[28]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[30]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[31]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[32]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[33]  James Ze Wang,et al.  The story picturing engine: finding elite images to illustrate a story using mutual reinforcement , 2004, MIR '04.

[34]  David Salesin,et al.  Fast multiresolution image querying , 1995, SIGGRAPH.