A Novel Approach for Filtering Junk Images from Google Search Results

Keyword-based image search engines such as Google Images are now very popular for accessing large amount of images on the Internet. Because only the text information that are directly or indirectly linked to the images are used for image indexing and retrieval, most existing image search engines such as Google Images may return large amount of junk images which are irrelevant to the given queries. To filter out the junk images from Google Images, we have developed a kernel-based image clustering technique to partition the images returned by Google Images into multiple visually-similar clusters. In addition, users are allowed to input their feedbacks for updating the underlying kernels to achieve more accurate characterization of the diversity of visual similarities between the images. To help users assess the goodness of image kernels and the relevance between the returned images, a novel framework is developed to achieve more intuitive visualization of large amount of returned images according to their visual similarity. Experiments on diverse queries on Google Images have shown that our proposed algorithm can filter out the junk images effectively. Online demo is also released for public evaluation at: http://www.cs.uncc.edu/~jfan/google-demo/.

[1]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[2]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[3]  Wei-Ying Ma,et al.  Multi-model similarity propagation and its application for web image retrieval , 2004, MULTIMEDIA '04.

[4]  B. S. Manjunath,et al.  Texture features and learning similarity , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[6]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[8]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[9]  Marcel Worring,et al.  Filter Image Browsing: Interactive Image Retrieval by Using Database Overviews , 2001, Multimedia Tools and Applications.

[10]  Jiří Matas,et al.  Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[11]  Jianping Fan,et al.  Multi-level annotation of natural scenes using dominant image components and semantic concepts , 2004, MULTIMEDIA '04.

[12]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Jianping Fan,et al.  New Approach for Hierarchical Classifier Training and Multi-level Image Annotation , 2008, MMM.

[14]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[15]  Jianping Fan,et al.  Hierarchical classification for automatic image annotation , 2007, SIGIR.

[16]  Wei-Ying Ma,et al.  Learning and inferring a semantic space from user's relevance feedback for image retrieval , 2002, MULTIMEDIA '02.

[17]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.