Harvesting Image Databases from the Web

The objective of this work is to automatically generate a large number of images for a specified object class (for example, penguin). A multi-modal approach employing both text, meta data and visual features is used to gather many, high-quality images from the Web. Candidate images are obtained by a text based Web search querying on the object identifier (the word penguin). The Web pages and the images they contain are downloaded. The task is then to remove irrelevant images and re-rank the remainder. First, the images are re-ranked using a Bayes posterior estimator trained on the text surrounding the image and meta data features (such as the image alternative tag, image title tag, and image filename). No visual information is used at this stage. Second, the top-ranked images are used as (noisy) training data and a SVM visual classifier is learnt to improve the ranking further. The principal novelty is in combining text/meta-data and visual features in order to achieve a completely automatic ranking of the images. Examples are given for a selection of animals (e.g. camels, sharks, penguins), vehicles (cars, airplanes, bikes) and other classes (guitar, wristwatch), totalling 18 classes. The results are assessed by precision/recall curves on ground truth annotated data and by comparison to previous approaches including those of Berg et al. (on an additional six classes) and Fergus et al.

[1]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[2]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[3]  Katharina Morik,et al.  Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[4]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[5]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[7]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[8]  Rong Jin,et al.  Web image retrieval re-ranking with relevance model , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[9]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[10]  Cordelia Schmid,et al.  Selection of scale-invariant parts for object class recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[13]  Tamara L. Berg,et al.  Names and faces in the news , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[15]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[16]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[17]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[18]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[21]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[22]  David A. Forsyth,et al.  Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, ICCV.

[24]  Gang Wang,et al.  OPTIMOL: automatic Online Picture collecTion via Incremental MOdel Learning , 2007, CVPR.

[25]  Gang Wang,et al.  Object image retrieval by exploiting online knowledge resources , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Fei-Fei Li,et al.  Towards Scalable Dataset Construction: An Active Learning Approach , 2008, ECCV.

[27]  Bernt Schiele,et al.  Decomposition, discovery and detection of visual categories using topic models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Kristen Grauman,et al.  Keywords to visual categories: Multiple-instance learning forweakly supervised object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Trevor Darrell,et al.  Unsupervised Learning of Visual Sense Models for Polysemous Words , 2008, NIPS.

[30]  Florian Schroff,et al.  Semantic Image Segmentation and Web-Supervised Visual Learning , 2009 .