Coherent and Noncoherent Dictionaries for Action Recognition

In this letter, we propose sparsity-based coherent and noncoherent dictionaries for action recognition. First, the input data are divided into different clusters and the number of clusters depends on the number of action categories. Within each cluster, we seek data items of each action category. If the number of data items exceeds threshold in any action category, these items are labeled as coherent. In a similar way, all coherent data items from different clusters form a coherent group of each action category, and data that are not part of the coherent group belong to noncoherent group of each action category. These coherent and noncoherent groups are learned using K-singular value decomposition dictionary learning. Since the coherent group has more similarity among data, only few atoms need to be learned. In the noncoherent group, there is a high variability among the data items. So, we propose an orthogonal-projection-based selection to get optimal dictionary in order to retain maximum variance in the data. Finally, the obtained dictionary atoms of both groups in each action category are combined and then updated using the limited Broyden–Fletcher–Goldfarb–Shanno optimization algorithm. The experiments are conducted on challenging datasets HMDB51 and UCF50 with action bank features and achieve comparable result using this state-of-the-art feature.

[1]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[2]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[3]  C. Krishna Mohan,et al.  Dictionary based action video classification with action bank , 2014, 2014 19th International Conference on Digital Signal Processing.

[4]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[5]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[6]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[7]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Mark D. Plumbley,et al.  Learning Incoherent Dictionaries for Sparse Approximation Using Iterative Projections and Rotations , 2013, IEEE Transactions on Signal Processing.

[9]  Mubarak Shah,et al.  Classifying web videos using a global video descriptor , 2013, Machine Vision and Applications.

[10]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[11]  Chandra Sekhar Seelamantula,et al.  A divide-and-conquer dictionary learning algorithm and its performance analysis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Mark D. Plumbley,et al.  INK-SVD: Learning incoherent dictionaries for sparse representations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[14]  Michael Elad,et al.  Dictionaries for Sparse Representation Modeling , 2010, Proceedings of the IEEE.

[15]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  S. Mohamad R. Soroushmehr,et al.  Coherence regularized dictionary learning , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Lei Zhang,et al.  Support Vector Guided Dictionary Learning , 2014, ECCV.

[18]  Miroslaw Bober,et al.  Fast, Compact, and Discriminative: Evaluation of Binary Descriptors for Mobile Applications , 2017, IEEE Transactions on Multimedia.

[19]  Yonina C. Eldar,et al.  Exploiting Statistical Dependencies in Sparse Representations for Signal Recovery , 2010, IEEE Transactions on Signal Processing.

[20]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[21]  Tal Hassner,et al.  Motion Interchange Patterns for Action Recognition in Unconstrained Videos , 2012, ECCV.

[22]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .