Autonomous visual model building based on image crawling through internet search engines

In this paper, we propose an autonomous learning scheme to automatically build visual semantic concept models from the output data of Internet search engines without any manual labeling work. First of all, images are gathered by crawling through the Internet using a search engine such as Google. Then, we model the search results as "Quasi-Positive Bags" in the Multiple-Instance Learning (MIL) framework. We call this generalized MIL (GMIL). We propose an algorithm called "Bag K-Means" to find the maximum Diverse Density (DD) without the existence of negative bags. A cost function is found as K-Means with special "Bag Distance". We also propose a solution called "Uncertain Labeling Density" (ULD) which describes the target density distribution of instances in the case of quasi-positive bags. A "Bag Fuzzy K-Means" is presented to get the maximum of ULD. By this generalized MIL with ULD, the model for a particular concept is learned from the crawled images of the Internet search engines. Experiments show that our algorithm can get correct models for the concepts we are interested in. Compared to the original Google Image Search, our algorithm shows improved accuracy.

[1]  Jun Wang,et al.  Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[2]  Ching-Yung Lin,et al.  Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[3]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[4]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[5]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[6]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[7]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[8]  Thomas Hofmann,et al.  Multiple instance learning with generalized support vector machines , 2002, AAAI/IAAI.

[9]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[10]  Ching-Yung Lin,et al.  Cross-Modality Automatic Face Model Training from Large Video Databases , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[11]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[12]  John R. Smith,et al.  MPEG-7 video automatic labeling system , 2003, MULTIMEDIA '03.

[13]  Sally A. Goldman,et al.  Multiple-Instance Learning of Real-Valued Data , 2001, J. Mach. Learn. Res..

[14]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[15]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[16]  A. Schneider Weighted possibilistic c-means clustering algorithms , 2000, Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No.00CH37063).

[17]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[18]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[19]  Rainer Lienhart,et al.  Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection , 2003, DAGM-Symposium.

[20]  Dragutin Petkovic,et al.  Content-based representation and retrieval of visual media: A state-of-the-art review , 1996, Multimedia Tools and Applications.

[21]  John R. Smith,et al.  VideoAL: a novel end-to-end MPEG-7 video automatic labeling system , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[22]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[23]  Thomas P. Minka,et al.  An image database browser that learns from user interaction , 1996 .

[24]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[25]  Qi Zhang,et al.  EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.