Integrating bilingual search results for automatic junk image filtering

Keyword-based image search engines are now very popular for accessing large amounts of Web images on the Internet. Most existing keyword-based image search engines may return large amounts of junk images (which are irrelevant to the given query word), because the text terms that are loosely associated with the Web images are also used for image indexing. The objective of the proposed work is to effectively filter out the junk images from image search results. Therefore, bilingual image search results for the same keyword-based query are integrated to identify the clusters of the junk images and the clusters of the relevant images. Within relevant image clusters, the results are further refined by removing the duplications under a coarse-to-fine structure. Experiments for a large number of bilingual keyword-based queries (5,000 query words) are simultaneously performed on two keyword-based image search engines (Google Images in English and Baidu Images in Chinese), and our experimental results have shown that integrating bilingual image search results can filter out the junk images effectively.

[1]  Yixin Chen,et al.  CLUE: cluster-based retrieval of images by unsupervised learning , 2005, IEEE Transactions on Image Processing.

[2]  David A. Forsyth,et al.  Discriminating Image Senses by Clustering with Multimodal Features , 2006, ACL.

[3]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[5]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[6]  Pamela R. Lipson,et al.  Context and configuration based scene classification , 1996 .

[7]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[9]  Edward Y. Chang,et al.  Enhancing DPF for near-replica image recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  Hung-Khoon Tan,et al.  Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context , 2009, IEEE Transactions on Multimedia.

[11]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[12]  Alan Hanjalic,et al.  Supervised reranking for web image search , 2010, ACM Multimedia.

[13]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[14]  Stefano Soatto,et al.  Filtering Internet image search results towards keyword based category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jianping Fan,et al.  Leveraging large-scale weakly-tagged images to train inter-related classifiers for multi-label annotation , 2009, LS-MMRM '09.

[16]  Thomas S. Huang,et al.  Small sample learning during multimedia retrieval using BiasMap , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[17]  Shih-Fu Chang,et al.  Detection of non-identical duplicate consumer photographs , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[18]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Xiaofei He,et al.  K-way min-max cut for image clustering and junk images filtering from Google images , 2010, ACM Multimedia.

[20]  Xuelong Li,et al.  Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm , 2006, IEEE Transactions on Multimedia.

[21]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[22]  Jianping Fan,et al.  A Novel Approach for Filtering Junk Images from Google Search Results , 2008, MMM.

[23]  Bin Wang,et al.  Large-Scale Duplicate Detection for Web Image Search , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[24]  Wei-Ying Ma,et al.  Multi-model similarity propagation and its application for web image retrieval , 2004, MULTIMEDIA '04.

[25]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[26]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[28]  Wei-Ying Ma,et al.  Learning and inferring a semantic space from user's relevance feedback for image retrieval , 2002, MULTIMEDIA '02.

[29]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[30]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[31]  W. Eric L. Grimson,et al.  Configuration based scene classification and image indexing , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[33]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[34]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[35]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[36]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[37]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[38]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Jing Hua,et al.  Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering , 2008, WWW.

[40]  Shih-Fu Chang,et al.  Detecting image near-duplicate by stochastic attributed relational graph matching with learning , 2004, MULTIMEDIA '04.