Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization

Massive image collections are increasingly available on the Web. These collections often incorporate complementary non-visual data such as text descriptions, comments, user ratings and tags. These additional data modalities may provide a semantic complement to the image visual content, which could improve the performance of different image content analysis tasks. This paper presents a novel method based on non-negative matrix factorization to generate multimodal image representations that integrate visual features and text information. The proposed approach discovers a set of latent factors that correlate multimodal data in the same representation space. We evaluated the potential of this multimodal image representation in various tasks associated to image indexing and search. Experimental results show that the proposed method highly outperforms the response of the system in both tasks, when compared to multimodal latent semantic spaces generated by a singular value decomposition.

[1]  Daniel Gatica-Perez,et al.  Modeling Semantic Aspects for Cross-Media Image Indexing , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Xiongfei Li,et al.  Multimodal Image Retrieval Based on Annotation Keywords and Visual Content , 2009, 2009 IITA International Conference on Control, Automation and Systems Engineering (case 2009).

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[5]  Jonathon S. Hare,et al.  A Linear-Algebraic Technique with an Application in Semantic Image Retrieval , 2006, CIVR.

[6]  Jonathon S. Hare,et al.  Automatically annotating the MIR Flickr dataset: experimental protocols, openly available data and semantic spaces , 2010, MIR '10.

[7]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[8]  Svetha Venkatesh,et al.  Nonnegative shared subspace learning and its application to social media retrieval , 2010, KDD.

[9]  Theodora Tsikrika,et al.  Overview of the WikipediaMM Task at ImageCLEF 2009 , 2009, CLEF.

[10]  Nanning Zheng,et al.  Non-negative matrix factorization for visual coding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[12]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[13]  Fabio A. González,et al.  NMF-based multimodal image indexing for querying by visual example , 2010, CIVR '10.

[14]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[15]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[16]  Alan F. Smeaton,et al.  Experiments on using semantic distances between words in image caption retrieval , 1996, SIGIR '96.

[17]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[18]  Fabio A. González,et al.  Multimodal Image Annotation Using Non-negative Matrix Factorization , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[19]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[20]  C. Févotte,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization , 2009 .

[21]  Gabriela Csurka,et al.  Crossing textual and visual content in different application scenarios , 2009, Multimedia Tools and Applications.

[22]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Rajeev Agrawal,et al.  Image Retrieval Using Multimodal Keywords , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[24]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[25]  Yuchou Chang,et al.  Supervised non-negative matrix factorization based latent semantic image indexing , 2006 .

[26]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[27]  Jonathan Foote,et al.  Summarizing video using non-negative similarity matrix factorization , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[28]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[29]  Masashi Inoue On the need for annotation-based image retrieval , 2004 .

[30]  Jiawei Han Data mining for image/video processing: a promising research frontier , 2008, CIVR '08.

[31]  Shih-Fu Chang,et al.  Pattern Mining in Visual Concept Streams , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[32]  Fabio A. González,et al.  A kernel-based strategy for exploratory image collection search , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[33]  Konrad Tollmar,et al.  A picture is worth a thousand keywords: image-based object search on a mobile platform , 2005, CHI Extended Abstracts.

[34]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[35]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[36]  Chao Liu,et al.  Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce , 2010, WWW '10.

[37]  Xing Xie,et al.  Photo-to-search: using multimodal queries to search the web from mobile devices , 2005, MIR '05.

[38]  Jonathon S. Hare,et al.  Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces , 2008, CIVR '08.

[39]  Cordelia Schmid,et al.  Image annotation with tagprop on the MIRFLICKR set , 2010, MIR '10.

[40]  Theodora Tsikrika,et al.  Overview of the WikipediaMM Task at ImageCLEF 2009 , 2009, CLEF.

[41]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[42]  Henning Müller,et al.  Overview of the CLEF 2009 Medical Image Retrieval Track , 2009, CLEF.

[43]  Jiayu Tang,et al.  Non-negative matrix factorisation for object class discovery and image auto-annotation , 2008, CIVR '08.

[44]  Robert Marti,et al.  Which is the best way to organize/classify images by content? , 2007, Image Vis. Comput..

[45]  Thomas Deselaers,et al.  Overview of the ImageCLEF 2006 Photographic Retrieval and Object Annotation Tasks , 2006, CLEF.