Generating Visual Concept Network from Large-Scale Weakly-Tagged Images

When large-scale online images come into view, it is very attractive to incorporate visual concept network for image summarization, organization and exploration. In this paper, we have developed an automatic algorithm for visual concept network generation by determining the diverse visual similarity contexts between the image concepts. To learn more reliable inter-concept visual similarity contexts, the images with diverse visual properties are crawled from multiple sources and multiple kernels are combined to characterize the diverse visual similarity contexts between the images and handle the issue of sparse image distribution more effectively in the high-dimensional multi-modal feature space. Kernel canonical correlation analysis (KCCA) is used to characterize the diverse inter-concept visual similarity contexts more accurately, so that our visual concept network can have better coherence with human perception. A similarity-preserving visual concept network visualization technique is developed to assist users on assessing the coherence between their perceptions and the inter-concept visual similarity contexts determined by our algorithm. Our experimental results on large-scale image collections have observed very good results.

[1]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[2]  Nenghai Yu,et al.  Visual language modeling for image classification , 2007, MIR '07.

[3]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[4]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[5]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[6]  Ramana Rao,et al.  The Hyperbolic Browser: A Focus + Context Technique for Visualizing Large Hierarchies , 1996, J. Vis. Lang. Comput..

[7]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[8]  Shih-Fu Chang,et al.  MediaNet: a multimedia information network for knowledge representation , 2000, SPIE Optics East.

[9]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[10]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[11]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[12]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[13]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[14]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[15]  Milind R. Naphade,et al.  A probabilistic framework for semantic indexing and retrieval in video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[16]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[18]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[19]  Jing Huang,et al.  An automatic hierarchical image classification scheme , 1998, MULTIMEDIA '98.

[20]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[21]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Jianping Fan,et al.  An Interactive Approach for Filtering Out Junk Images From Keyword-Based Google Search Results , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Nuno Vasconcelos,et al.  Image indexing with mixture hierarchies , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.