Fast Multi-Modal Unified Sparse Representation Learning

Exploiting feature sets belonging to different modalities helps in improving a significant amount of accuracy for the task of recognition. Given representations of an object in different modalities (e.g. image, text, audio etc.), to learn a unified representation of the object, has been a popular problem in the literature of multimedia retrieval. In this paper, we introduce a new iterative algorithm that learns the sparse unified representation with better accuracy in a lesser number of iterations than the previously reported results. Our algorithm employs a new fixed-point iterative scheme along with an inertial step. In order to obtain more discriminative representation, we also imposed a regularization term that utilizes the label information from the datasets. Experimental results on two real benchmark datasets demonstrate the efficacy of our method in terms of the number of iterations and accuracy.

[1]  A. Chambolle,et al.  On the Convergence of the Iterates of the “Fast Iterative Shrinkage/Thresholding Algorithm” , 2015, J. Optim. Theory Appl..

[2]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[3]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[4]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[5]  Mridula Verma,et al.  Informed multimodal latent subspace learning via supervised matrix factorization , 2016, ICVGIP '16.

[6]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[7]  Jun Yu,et al.  Exploiting Click Constraints and Multi-view Features for Image Re-ranking , 2014, IEEE Transactions on Multimedia.

[8]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[9]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[11]  Mridula Verma,et al.  A new faster first order iterative scheme for sparsity-based multitask learning , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[12]  Zhou Yu,et al.  Sparse Multi-Modal Hashing , 2014, IEEE Transactions on Multimedia.

[13]  Wei Wang,et al.  Learning unified sparse representations for multi-modal data , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[14]  Fabio A. González,et al.  Multimodal fusion for image retrieval using matrix factorization , 2012, ICMR '12.

[15]  Dirk A. Lorenz,et al.  An Inertial Forward-Backward Algorithm for Monotone Inclusions , 2014, Journal of Mathematical Imaging and Vision.

[16]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[17]  W. R. Mann,et al.  Mean value methods in iteration , 1953 .

[18]  Roger Levy,et al.  On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  D. R. Sahu,et al.  APPLICATIONS OF THE S-ITERATION PROCESS TO CONSTRAINED MINIMIZATION PROBLEMS AND SPLIT FEASIBILITY PROBLEMS , 2011 .

[20]  Mridula Verma,et al.  A new accelerated proximal technique for regression with high-dimensional datasets , 2017, Knowledge and Information Systems.