Multimodal Task-Driven Dictionary Learning for Image Classification

Dictionary learning algorithms have been successfully used for both reconstructive and discriminative tasks, where an input signal is represented with a sparse linear combination of dictionary atoms. While these methods are mostly developed for single-modality scenarios, recent studies have demonstrated the advantages of feature-level fusion based on the joint sparse representation of the multimodal inputs. In this paper, we propose a multimodal task-driven dictionary learning algorithm under the joint sparsity constraint (prior) to enforce collaborations among multiple homogeneous/heterogeneous sources of information. In this task-driven formulation, the multimodal dictionaries are learned simultaneously with their corresponding classifiers. The resulting multimodal dictionaries can generate discriminative latent features (sparse codes) from the data that are optimized for a given task such as binary or multiclass classification. Moreover, we present an extension of the proposed formulation using a mixed joint and independent sparsity prior, which facilitates more flexible fusion of the modalities at feature level. The efficacy of the proposed algorithms for multimodal classification is illustrated on four different applications-multimodal face recognition, multi-view face recognition, multi-view action recognition, and multimodal biometric recognition. It is also shown that, compared with the counterpart reconstructive-based dictionary learning algorithms, the task-driven formulations are more computationally efficient in the sense that they can be equipped with more compact dictionaries and still achieve superior performance.

[1]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[2]  David Zhang,et al.  Relaxed collaborative representation for pattern classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Thomas S. Huang,et al.  Simultaneous discriminative projection and dictionary learning for sparse representation based classification , 2013, Pattern Recognit..

[4]  Bhaskar D. Rao,et al.  Sparse solutions to linear inverse problems with multiple measurement vectors , 2005, IEEE Transactions on Signal Processing.

[5]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Patrice Marcotte,et al.  An overview of bilevel optimization , 2007, Ann. Oper. Res..

[7]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[9]  Thomas S. Huang,et al.  Joint dynamic sparse representation for multi-view face recognition , 2012, Pattern Recognit..

[10]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Asok Ray,et al.  Quality-Based Multimodal Classification Using Tree-Structured Sparsity , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Massimo Tistarelli,et al.  Feature Level Fusion of Face and Fingerprint Biometrics , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.

[13]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[14]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[15]  Trac D. Tran,et al.  Multi-task image classification via collaborative, hierarchical spike-and-slab priors , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[16]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[17]  Wang Donghui,et al.  A Brief Summary of Dictionary Learning Based Approach for Classification , 2012, ArXiv.

[18]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[19]  Michael P. Friedlander,et al.  Theoretical and Empirical Results for Recovery From Multiple Measurements , 2009, IEEE Transactions on Information Theory.

[20]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[21]  Jie Yang,et al.  Sensor fusion using Dempster-Shafer theory [for context-aware HCI] , 2002, IMTC/2002. Proceedings of the 19th IEEE Instrumentation and Measurement Technology Conference (IEEE Cat. No.00CH37276).

[22]  D. Ruta,et al.  An Overview of Classifier Fusion Methods , 2000 .

[23]  Sharath Pankanti,et al.  The relation between the ROC curve and the CMC , 2005, Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05).

[24]  SapiroGuillermo,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2010 .

[25]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[26]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[27]  Rama Chellappa,et al.  Joint Sparse Representation for Robust Multimodal Biometrics Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[30]  Trac D. Tran,et al.  Robust multi-sensor classification via joint sparse representation , 2011, 14th International Conference on Information Fusion.

[31]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[32]  Massimo Tistarelli,et al.  Robust Multi-modal and Multi-unit Feature Level Fusion of Face and Iris Biometrics , 2009, ICB.

[33]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[34]  Thomas S. Huang,et al.  Bilevel sparse coding for coupled feature spaces , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Bernhard Rinner,et al.  Vehicle Classification on Multi-Sensor Smart Cameras Using Feature- and Decision-Fusion , 2007, 2007 First ACM/IEEE International Conference on Distributed Smart Cameras.

[36]  Pierre Vandergheynst,et al.  Learning Multimodal Dictionaries , 2007, IEEE Transactions on Image Processing.

[37]  L. Eon Bottou Online Learning and Stochastic Approximations , 1998 .

[38]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Pascal Vasseur,et al.  Introduction to Multisensor Data Fusion , 2005, The Industrial Information Technology Handbook.

[40]  J. Tropp Algorithms for simultaneous sparse approximation. Part II: Convex relaxation , 2006, Signal Process..

[41]  Lei Zhang,et al.  Metaface learning for sparse representation based face recognition , 2010, 2010 IEEE International Conference on Image Processing.

[42]  Jie Yang,et al.  Sensor Fusion Using Dempster-Shafer Theory , 2002 .

[43]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[44]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[45]  Vishal Monga,et al.  Simultaneous Sparsity Model for Histopathological Image Representation and Classification , 2014, IEEE Transactions on Medical Imaging.

[46]  James Llinas,et al.  An introduction to multisensor data fusion , 1997, Proc. IEEE.

[47]  Mohamed-Jalal Fadili,et al.  The Degrees of Freedom of the Group Lasso , 2012, ICML 2012.

[48]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  A. Ross,et al.  Level Fusion Using Hand and Face Biometrics , 2005 .

[50]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Thomas S. Huang,et al.  Joint-Structured-Sparsity-Based Classification for Multiple-Measurement Transient Acoustic Signals , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[52]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[53]  James J. Little,et al.  Unlabelled 3D Motion Examples Improve Cross-View Action Recognition , 2014, BMVC.

[54]  Pramod K. Varshney,et al.  Multisensor Data Fusion , 1997, IEA/AIE.

[55]  Donghui Wang,et al.  A Brief Summary of Dictionary Learning Based Approach for Classification (revised) , 2012, ArXiv.

[56]  Jingjing Zheng,et al.  Learning View-Invariant Sparse Representations for Cross-View Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[57]  Yueting Zhuang,et al.  Supervised Coupled Dictionary Learning with Group Structures for Multi-modal Retrieval , 2013, AAAI.

[58]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[59]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[60]  ShekharSumit,et al.  Joint Sparse Representation for Robust Multimodal Biometrics Recognition , 2014 .

[61]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[62]  A. Martínez,et al.  The AR face databasae , 1998 .

[63]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[64]  Michael Elad,et al.  Sparse and Redundant Modeling of Image Content Using an Image-Signature-Dictionary , 2008, SIAM J. Imaging Sci..

[65]  Günther Palm,et al.  Intelligent Problem Solving. Methodologies and Approaches , 2003, Lecture Notes in Computer Science.

[66]  David M. Bradley,et al.  Differentiable Sparse Coding , 2008, NIPS.

[67]  Alain Rakotomamonjy,et al.  Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms , 2011, Signal Process..

[68]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[69]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[70]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[71]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[73]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[74]  Julien Mairal,et al.  Learning hierarchical and topographic dictionaries with structured sparsity , 2011, Optical Engineering + Applications.

[75]  Arun Ross,et al.  Feature level fusion of hand and face biometrics , 2005, SPIE Defense + Commercial Sensing.

[76]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[77]  Thomas S. Huang,et al.  Multi-observation visual recognition via joint dynamic sparse representation , 2011, 2011 International Conference on Computer Vision.

[78]  Thomas S. Huang,et al.  Supervised translation-invariant sparse coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[79]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[80]  Yonina C. Eldar,et al.  C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework , 2010, IEEE Transactions on Signal Processing.