Harvesting Training Images for Fine-Grained Object Categories Using Visual Descriptions

We harvest training images for visual object recognition by casting it as an IR task. In contrast to previous work, we concentrate on fine-grained object categories, such as the large number of particular animal subspecies, for which manual annotation is expensive. We use ‘visual descriptions’ from nature guides as a novel augmentation to the well-known use of category names. We use these descriptions in both the query process to find potential category images as well as in image reranking where an image is more highly ranked if web page text surrounding it is similar to the visual descriptions. We show the potential of this method when harvesting images for 10 butterfly categories: when compared to a method that relies on the category name only, using visual descriptions improves precision for many categories.

[1]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[2]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Gang Wang,et al.  OPTIMOL: automatic Online Picture collecTion via Incremental MOdel Learning , 2007, CVPR.

[4]  Katja Markert,et al.  Learning Models for Object Recognition from Natural Language Descriptions , 2009, BMVC.

[5]  Babak Saleh,et al.  Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  David A. Forsyth,et al.  Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Kristen Grauman,et al.  Keywords to visual categories: Multiple-instance learning forweakly supervised object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Gerard Salton,et al.  Length Normalization in Degraded Text Collections , 1995 .

[9]  Jianping Fan,et al.  Automatic image-text alignment for large-scale web image indexing and retrieval , 2015, Pattern Recognit..

[10]  Sanja Fidler,et al.  Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Mohamed A. Ismail,et al.  Learning-Based Incremental Creation of Web Image Databases , 2013, 2013 12th International Conference on Machine Learning and Applications.

[14]  Frédéric Jurie,et al.  Improving web image search results using query-relative classifiers , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[16]  Fei-Fei Li,et al.  Towards Scalable Dataset Construction: An Active Learning Approach , 2008, ECCV.

[17]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[18]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.