Unsupervised learning of finite full covariance multivariate generalized Gaussian mixture models for human activity recognition

We propose in this paper to recognize human activities through an unsupervised learning of finite multivariate generalized Gaussian mixture model. We address an important cue in finite mixture model which is the estimation of the mixture model’s parameters for a full covariance matrix. We have developed a novel learning algorithm based on Fixed-point covariance matrix estimator combined with the Expectation-Maximization algorithm. Furthermore, we have proposed an appropriate minimum message length (MML) criterion to deal with model selection problem. We evaluated our proposed method on synthetic datasets and a challenging application namely : Human activity recognition from images and videos. The obtained resutls show clearly the merits of our proposed framework which has better capabilities with full covariance matrix when modeling correlated data.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Nizar Bouguila,et al.  Variational learning for Dirichlet process mixtures of Dirichlet distributions and applications , 2012, Multimedia Tools and Applications.

[3]  Jesse Hoey,et al.  Automatic Task Assistance for People with Cognitive Disabilities in Brushing Teeth - A User Study with the TEBRA System , 2014, TACC.

[4]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[5]  Mubarak Shah,et al.  Discovering Motion Primitives for Unsupervised Grouping and One-Shot Learning of Human Actions, Gestures, and Expressions , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[7]  Amel Benazza-Benyahia,et al.  Efficient transform-based texture image retrieval techniques under quantization effects , 2016, Multimedia Tools and Applications.

[8]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, CVPR.

[9]  Douglas Kelker,et al.  DISTRIBUTION THEORY OF SPHERICAL DISTRIBUTIONS AND A LOCATION-SCALE PARAMETER GENERALIZATION , 2016 .

[10]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Peter Grünwald,et al.  Invited review of the book Statistical and Inductive Inference by Minimum Message Length , 2006 .

[12]  Luming Zhang,et al.  Action2Activity: Recognizing Complex Activities from Sensor Data , 2015, IJCAI.

[13]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  M. Varanasi,et al.  Parametric generalized Gaussian density estimation , 1989 .

[15]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[17]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Li Liu,et al.  Recognizing Complex Activities by a Probabilistic Interval-Based Model , 2016, AAAI.

[19]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[20]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[21]  Nizar Bouguila,et al.  High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[23]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[24]  David S. Rosenblum,et al.  From action to activity: Sensor-based activity recognition , 2016, Neurocomputing.

[25]  Jean-Yves Tourneret,et al.  Parameter Estimation For Multivariate Generalized Gaussian Distributions , 2013, IEEE Transactions on Signal Processing.

[26]  Ioannis A. Kakadiaris,et al.  A Review of Human Activity Recognition Methods , 2015, Front. Robot. AI.

[27]  K. Srinivasa Rao,et al.  Text Independent Speaker Identification with Finite Multivariate Generalized Gaussian Mixture Model and Hierarchical Clustering Algorithm , 2010 .

[28]  Alexandros Iosifidis,et al.  Human action recognition based on bag of features and multi-view neural networks , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[29]  Xue Li,et al.  Action recognition in still images using a combination of human pose and context information , 2012, 2012 19th IEEE International Conference on Image Processing.

[30]  Jonathan J. Oliver,et al.  Finding overlapping components with MML , 2000, Stat. Comput..

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  Lloyd Allison,et al.  Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions , 2015, Machine Learning.

[33]  Sami Bourouis,et al.  Image and video denoising by combining unsupervised bounded generalized gaussian mixture modeling and spatial information , 2018, Multimedia Tools and Applications.

[34]  Nazli Ikizler-Cinbis,et al.  Facial descriptors for human interaction recognition in still images , 2015, Pattern Recognit. Lett..

[35]  David L. Dowe,et al.  Unsupervised Learning of Correlated Multivariate Gaussian Mixture Models Using MML , 2003, Australian Conference on Artificial Intelligence.

[36]  Simone Calderara,et al.  Detection of abnormal behaviors using a mixture of Von Mises distributions , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[37]  S. Kotz Multivariate Distributions at a Cross Road , 1975 .

[38]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[39]  Fatma Najar,et al.  A Comparison Between Different Gaussian-Based Mixture Models , 2017, 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA).

[40]  Fei-Fei Li,et al.  Action Recognition with Exemplar Based 2.5D Graph Matching , 2012, ECCV.

[41]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[42]  F. Bremond,et al.  HUMAN ACTION RECOGNITION IN VIDEOS : A SURVEY , 2016 .

[43]  Nizar Bouguila,et al.  Semantic Scene Classification with Generalized Gaussian Mixture Models , 2015, ICIAR.

[44]  Nizar Bouguila,et al.  Simultaneous high-dimensional clustering and feature selection using asymmetric Gaussian mixture models , 2015, Image Vis. Comput..

[45]  Tullio Vernazza,et al.  Human motion modelling and recognition: A computational approach , 2012, 2012 IEEE International Conference on Automation Science and Engineering (CASE).

[46]  Fatma Najar,et al.  A Fixed-Point Estimation Algorithm for Learning the Multivariate GGMM: Application to Human Action Recognition , 2018, 2018 IEEE Canadian Conference on Electrical & Computer Engineering (CCECE).