Sparse Methods for Robust and Efficient Visual Recognition

Title of dissertation: Sparse Methods for Robust and Efficient Visual Recognition Sumit Shekhar, Doctor of Philosophy, 2014 Dissertation directed by: Professor Rama Chellappa Department of Electrical and Computer Engineering Visual recognition has been a subject of extensive research in computer vision. A vast literature exists on feature extraction and learning methods for recognition. However, due to large variations in visual data, robust visual recognition is still an open problem. In recent years, sparse representation-based methods have become popular for visual recognition. By learning a compact dictionary of data and exploiting the notion of sparsity, start-of-the-art results have been obtained on many recognition tasks. However, existing data-driven sparse model techniques may not be optimal for some challenging recognition problems. In this dissertation, we consider some of these recognition tasks and present approaches based on sparse coding for robust and efficient recognition in such cases. First we study the problem of low-resolution face recognition. This is a challenging problem, and methods have been proposed using super-resolution and machine learningbased techniques. However, these methods cannot handle variations like illumination changes which can happen at low resolutions, and degrade the performance. We propose a generative approach for classifying low resolution faces, by exploiting 3D face models. Further, we propose a joint sparse coding framework for robust classification at low resolutions. The effectiveness of the method is demonstrated on different face datasets. In the second part, we study a robust feature-level fusion method for multimodal biometric recognition. Although score-level and decision-level fusion methods exist in biometric literature, feature-level fusion is challenging due to different output formats of biometric modalities. In this work, we propose a novel sparse representation-based method for multimodal fusion, and present experimental results for a large multimodal dataset. Robustness to noise and occlusion are demonstrated. In the third part, we consider the problem of domain adaptation, where we want to learn effective classifiers for cases where the test images come from a different distribution than the training data. Typically, due to high cost of human annotation, very few labeled samples are available for images in the test domain. Specifically, we study the problem of adapting sparse dictionary-based classification methods for such cases. We describe a technique which jointly learns projections of data in the two domains, and a latent dictionary which can succinctly represent both domains in the projected lowdimensional space. The proposed method is efficient and performs on par or better than many competing state-of-the-art methods. Lastly, we study an emerging analysis framework of sparse coding for image classification. We show that the analysis sparse coding can give similar performance as the typical synthesis sparse coding methods, while being much faster at sparse encoding. In the end, we conclude the dissertation with discussions and possible future directions. Sparse Methods for Robust and Efficient Visual Recognition

[1]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[2]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[3]  Rama Chellappa,et al.  Subspace Interpolation via Dictionary Learning for Unsupervised Domain Adaptation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Damon L. Woodard,et al.  Non-ideal iris segmentation using graph cuts , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[5]  Yousef Saad,et al.  Trace optimization and eigenproblems in dimension reduction methods , 2011, Numer. Linear Algebra Appl..

[6]  Arun Ross,et al.  Feature level fusion of hand and face biometrics , 2005, SPIE Defense + Commercial Sensing.

[7]  Michael Elad,et al.  The Cosparse Analysis Model and Algorithms , 2011, ArXiv.

[8]  David Zhang,et al.  On the Dimensionality Reduction for Sparse Representation Based Face Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[9]  Pawan Sinha,et al.  Face Recognition by Humans: Nineteen Results All Computer Vision Researchers Should Know About , 2006, Proceedings of the IEEE.

[10]  Tom Diethe,et al.  Constructing Nonlinear Discriminants from Multiple Data Views , 2010, ECML/PKDD.

[11]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Yuan Shi,et al.  Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation , 2012, ICML.

[13]  John Daugman,et al.  How iris recognition works , 2002, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Pablo H. Hennings-Yeomans,et al.  Simultaneous super-resolution and feature extraction for recognition of low-resolution faces , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Arun Ross,et al.  Multimodal biometrics: An overview , 2004, 2004 12th European Signal Processing Conference.

[17]  Yoram Bresler,et al.  Learning overcomplete sparsifying transforms for signal processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Rama Chellappa,et al.  Secure and Robust Iris Recognition Using Random Projections and Sparse Representations , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[20]  Sharath Pankanti,et al.  Filterbank-based fingerprint matching , 2000, IEEE Trans. Image Process..

[21]  Arun Ross,et al.  Periocular Biometrics in the Visible Spectrum , 2011, IEEE Transactions on Information Forensics and Security.

[22]  José M. Bioucas-Dias,et al.  An Augmented Lagrangian Approach to the Constrained Optimization Formulation of Imaging Inverse Problems , 2009, IEEE Transactions on Image Processing.

[23]  Ivor W. Tsang,et al.  Learning With Augmented Features for Supervised and Semi-Supervised Heterogeneous Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Rama Chellappa,et al.  Generalized Domain-Adaptive Dictionaries , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[26]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Bernhard Rinner,et al.  Vehicle Classification on Multi-Sensor Smart Cameras Using Feature- and Decision-Fusion , 2007, 2007 First ACM/IEEE International Conference on Distributed Smart Cameras.

[28]  Thomas S. Huang,et al.  Image super-resolution as sparse representation of raw image patches , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Rama Chellappa,et al.  Sparse Embedding: A Framework for Sparsity Promoting Dimensionality Reduction , 2012, ECCV.

[30]  Pong C. Yuen,et al.  Very low resolution face recognition problem , 2010, 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[31]  Junfeng Yang,et al.  Alternating Direction Algorithms for 1-Problems in Compressive Sensing , 2009, SIAM J. Sci. Comput..

[32]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Takeo Kanade,et al.  Limits on super-resolution and how to break them , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[34]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[35]  Patrick J. Flynn,et al.  Multidimensional Scaling for Matching Low-Resolution Face Images , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[37]  Rama Chellappa,et al.  Unsupervised Adaptation Across Domain Shifts by Generating Intermediate Data Representations , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Vishal M. Patel,et al.  Unsupervised domain adaptation using parallel transport on Grassmann manifold , 2014, IEEE Winter Conference on Applications of Computer Vision.

[39]  Shengcai Liao,et al.  Coupled Discriminant Analysis for Heterogeneous Face Recognition , 2012, IEEE Transactions on Information Forensics and Security.

[40]  René Vidal,et al.  Robust classification using structured sparse representation , 2011, CVPR 2011.

[41]  Chang Wang,et al.  Heterogeneous Domain Adaptation Using Manifold Alignment , 2011, IJCAI.

[42]  Dieter Fox,et al.  Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms , 2011, NIPS.

[43]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[44]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[45]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[46]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[47]  Kristen Grauman,et al.  Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation , 2013, ICML.

[48]  Takio Kurita,et al.  Scale invariant face detection method using higher-order local autocorrelation features extracted from log-polar image , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[49]  Roberto Cipolla,et al.  Face Recognition from Video Using the Generic Shape-Illumination Manifold , 2006, ECCV.

[50]  William T. Freeman,et al.  Example-Based Super-Resolution , 2002, IEEE Computer Graphics and Applications.

[51]  Jongmoo Choi,et al.  Non-Cooperative Persons Identification at a Distance with 3D Face Modeling , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.

[52]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Dong Liu,et al.  Robust visual domain adaptation with low-rank reconstruction , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[55]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[56]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[57]  Sharath Pankanti,et al.  The relation between the ROC curve and the CMC , 2005, Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05).

[58]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[59]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[61]  Rama Chellappa,et al.  Joint Sparse Representation for Robust Multimodal Biometrics Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Trac D. Tran,et al.  Robust multi-sensor classification via joint sparse representation , 2011, 14th International Conference on Information Fusion.

[64]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[65]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[66]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Melvyn L. Smith,et al.  The nose on your face may not be so plain: Using the nose as a biometric , 2009, ICDP.

[68]  B. V. K. Vijaya Kumar,et al.  Coupled Marginal Fisher Analysis for Low-Resolution Face Recognition , 2012, ECCV Workshops.

[69]  Rama Chellappa,et al.  Sparse Representations and Compressive Sensing for Imaging and Vision , 2013, Springer Briefs in Electrical and Computer Engineering.

[70]  Ivor W. Tsang,et al.  Heterogeneous Domain Adaptation for Multiple Classes , 2014, AISTATS.

[71]  Anil K. Jain,et al.  Likelihood Ratio-Based Biometric Score Fusion , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[73]  Michael Elad,et al.  Analysis K-SVD: A Dictionary-Learning Algorithm for the Analysis Sparse Model , 2013, IEEE Transactions on Signal Processing.

[74]  Ioannis Gkioulekas,et al.  Dimensionality Reduction Using the Sparse Linear Model , 2011, NIPS.

[75]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[76]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[77]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78]  Rama Chellappa,et al.  Design of Non-Linear Kernel Dictionaries for Object Recognition , 2013, IEEE Transactions on Image Processing.

[79]  Rama Chellappa,et al.  Domain Adaptive Dictionary Learning , 2012, ECCV.

[80]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[81]  Aly A. Farag,et al.  Distant face recognition based on sparse-stereo reconstruction , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[82]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[83]  Rama Chellappa,et al.  Robust Estimation of Albedo for Illumination-invariant Matching and Shape Recovery , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[84]  Vikas Sindhwani,et al.  An RKHS for multi-view learning and manifold co-regularization , 2008, ICML '08.

[85]  Rama Chellappa,et al.  Dictionary-Based Face Recognition Under Variable Lighting and Pose , 2012, IEEE Transactions on Information Forensics and Security.

[86]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[87]  Ivor W. Tsang,et al.  Domain adaptation from multiple sources via auxiliary classifiers , 2009, ICML '09.

[88]  Thomas S. Huang,et al.  Multi-observation visual recognition via joint dynamic sparse representation , 2011, 2011 International Conference on Computer Vision.

[89]  Rama Chellappa,et al.  Coupled Projections for Semi-supervised Adaptation of Dictionaries , 2014 .

[90]  Patrick J. Flynn,et al.  Overview of the face recognition grand challenge , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[91]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[92]  Rama Chellappa,et al.  Joint Sparsity-Based Robust Multimodal Biometrics Recognition , 2012, ECCV Workshops.

[93]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[94]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[95]  Berthold K. P. Horn,et al.  Shape from shading , 1989 .

[96]  Yueting Zhuang,et al.  Sparse Unsupervised Dimensionality Reduction for Multiple View Data , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[97]  Rama Chellappa,et al.  Learning discriminative dictionaries with partially labeled data , 2012, 2012 19th IEEE International Conference on Image Processing.

[98]  Philip S. Yu,et al.  Transfer Learning on Heterogenous Feature Spaces via Spectral Transformation , 2010, 2010 IEEE International Conference on Data Mining.

[99]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[100]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[101]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[102]  J. Friedman Regularized Discriminant Analysis , 1989 .

[103]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[104]  Shaogang Gong,et al.  Multi-modal tensor face for simultaneous super-resolution and recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[105]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[106]  D. Jacobs,et al.  Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch , 2011, CVPR 2011.

[107]  Yücel Altunbasak,et al.  Eigenface-domain super-resolution for face recognition , 2003, IEEE Trans. Image Process..

[108]  Rama Chellappa,et al.  Analysis sparse coding models for image-based classification , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[109]  Trevor Darrell,et al.  Factorized Latent Spaces with Structured Sparsity , 2010, NIPS.

[110]  Rama Chellappa,et al.  Synthesis-based recognition of low resolution faces , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[111]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[112]  D. Yeung,et al.  Super-resolution through neighbor embedding , 2004, CVPR 2004.

[113]  Thomas S. Huang,et al.  Coupled Dictionary Training for Image Super-Resolution , 2012, IEEE Transactions on Image Processing.

[114]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[115]  Hong Yan,et al.  Coupled Kernel Embedding for Low-Resolution Face Image Recognition , 2012, IEEE Transactions on Image Processing.

[116]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[117]  Xiaoli Zhou,et al.  Feature Fusion of Face and Gait for Human Recognition at a Distance in Video , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[118]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[119]  Arun Ross,et al.  Handbook of Multibiometrics , 2006, The Kluwer international series on biometrics.

[120]  Yoram Bresler,et al.  Learning Sparsifying Transforms , 2013, IEEE Transactions on Signal Processing.

[121]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[122]  Libor Masek,et al.  MATLAB Source Code for a Biometric Identification System Based on Iris Patterns , 2003 .

[123]  Rama Chellappa,et al.  Sparse Representations, Compressive Sensing and dictionaries for pattern recognition , 2011, The First Asian Conference on Pattern Recognition.

[124]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[125]  Shiguang Shan,et al.  Low-Resolution Face Recognition via Coupled Locality Preserving Mappings , 2010, IEEE Signal Processing Letters.

[126]  Yong Man Ro,et al.  Color Face Recognition for Degraded Face Images , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[127]  Rama Chellappa,et al.  Dictionaries for image and video-based face recognition [Invited]. , 2014, Journal of the Optical Society of America. A, Optics, image science, and vision.

[128]  Rama Chellappa,et al.  Sparse representations and Random Projections for robust and cancelable biometrics , 2010, 2010 11th International Conference on Control Automation Robotics & Vision.

[129]  Dong Xu,et al.  Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[130]  Hossein Mobahi,et al.  Toward a Practical Face Recognition System: Robust Alignment and Illumination by Sparse Representation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[131]  Quan Pan,et al.  Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[132]  Shuicheng Yan,et al.  Multi-task low-rank affinity pursuit for image segmentation , 2011, 2011 International Conference on Computer Vision.

[133]  David J. Kriegman,et al.  Wet fingerprint recognition: Challenges and opportunities , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[134]  Michael Elad,et al.  Performance Guarantees of the Thresholding Algorithm for the Cosparse Analysis Model , 2013, IEEE Transactions on Information Theory.

[135]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[136]  Sumit Chopra,et al.  DLID: Deep Learning for Domain Adaptation by Interpolating between Domains , 2013 .

[137]  Patrick J. Flynn,et al.  Multidimensional scaling for matching low-resolution facial images , 2010, BTAS.

[138]  Baoxin Li,et al.  A compressive sensing approach for expression-invariant face recognition , 2009, CVPR.

[139]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[140]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[141]  Haizhou Li,et al.  Advanced Topics in Biometrics , 2010 .

[142]  Sang-Woong Lee,et al.  Low resolution face recognition based on support vector data description , 2006, Pattern Recognit..

[143]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[144]  Massimo Tistarelli,et al.  Feature Level Fusion of Face and Fingerprint Biometrics , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.