Sparse Linear Integration of Content and Context Modalities for Semantic Concept Retrieval

The semantic gap between low-level visual features and high-level semantics is a well-known challenge in content-based multimedia information retrieval. With the rapid popularization of social media, which allows users to assign tags to describe images and videos, attention is naturally drawn to take advantage of these metadata in order to bridge the semantic gap. This paper proposes a sparse linear integration (SLI) model that focuses on integrating visual content and its associated metadata, which are referred to as the content and the context modalities, respectively, for semantic concept retrieval. An optimization problem is formulated to approximate an instance using a sparse linear combination of other instances and minimize the difference between them. The prediction score of a concept for a test instance measures how well it can be reconstructed by the positive instances of that concept. Two benchmark image data sets and their associated tags are used to evaluate the SLI model. Experimental results show promising performance by comparing with the approaches based on a single modality and approaches based on popular fusion methods.

[1]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Dong Liu,et al.  Event-Driven Semantic Concept Discovery by Exploiting Weakly Tagged Internet Images , 2014, ICMR.

[3]  Marinka Zitnik,et al.  Data Fusion by Matrix Factorization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Saïd Ladjal,et al.  Outlier Removal Power of the L1-Norm Super-Resolution , 2013, SSVM.

[5]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[6]  Fabio A. González,et al.  Multimodal fusion for image retrieval using matrix factorization , 2012, ICMR '12.

[7]  Zhao Li,et al.  Multimodal Sparse Linear Integration for Content-Based Item Recommendation , 2013, 2013 IEEE International Symposium on Multimedia.

[8]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[9]  Chao Chen,et al.  Web media semantic concept retrieval via tag removal and model fusion , 2013, ACM Trans. Intell. Syst. Technol..

[10]  Mei-Ling Shyu,et al.  Effective Moving Object Detection and Retrieval via Integrating Spatial-Temporal Multimedia Information , 2012, 2012 IEEE International Symposium on Multimedia.

[11]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[12]  Shamik Sural,et al.  Segmentation and histogram generation using the HSV color space for image retrieval , 2002, Proceedings. International Conference on Image Processing.

[13]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[14]  George Karypis,et al.  SLIM: Sparse Linear Methods for Top-N Recommender Systems , 2011, 2011 IEEE 11th International Conference on Data Mining.

[15]  Xindong Wu,et al.  Group Feature Selection with Streaming Features , 2013, 2013 IEEE 13th International Conference on Data Mining.

[16]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[17]  Stefanie Nowak,et al.  The Fraunhofer IDMT at ImageCLEF 2011 Photo Annotation Task , 2011, CLEF.

[18]  Johan A. K. Suykens,et al.  L2-norm multiple kernel learning and its application to biomedical data fusion , 2010, BMC Bioinformatics.

[19]  Christian Bauckhage,et al.  Non-negative Matrix Factorization in Multimodality Data for Segmentation and Label Prediction , 2011 .

[20]  George Karypis,et al.  Sparse linear methods with side information for top-n recommendations , 2012, RecSys.

[21]  Yu-Chiang Frank Wang,et al.  A Novel Multiple Kernel Learning Framework for Heterogeneous Feature Fusion and Variable Selection , 2012, IEEE Transactions on Multimedia.

[22]  Shizhong Xu,et al.  Empirical Bayesian elastic net for multiple quantitative trait locus mapping , 2014, Heredity.

[23]  Atsuo Yoshitaka,et al.  A Survey on Content-Based Retrieval for Multimedia Databases , 1999, IEEE Trans. Knowl. Data Eng..

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Mei-Ling Shyu,et al.  Model-driven collaboration and information integration for enhancing video semantic concept detection , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[27]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[28]  Fabio A. González,et al.  Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization , 2012, Neurocomputing.

[29]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[30]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Bharti,et al.  An efficient approach for Color Image Retrieval using Haar wavelet , 2009, 2009 Proceeding of International Conference on Methods and Models in Computer Science (ICM2CS).

[32]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[33]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[34]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[35]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.