Distribution-Sensitive Learning on Relevance Vector Machine for Pose-Based Human Gesture Recognition

Abstract Many real-world gesture datasets are by nature containing unbalanced number of poses across classes. Such imbalance severely reduces bag-of-poses based classification performance. On the other hand, collecting a dataset of human gestures or actions is an expensive and time-consuming procedure. It is often impractical to reacquire the data or to modify the existing dataset using oversampling or undersampling procedures. The best way to handle such imbalance is by making the used classifier be directly aware and adapt to the real condition inside the data. Balancing class distribution, i.e., the number of pose samples per class, is one of difficult tasks in machine learning. Standard statistical learning models (e.g., SVM, HMM, CRF) are insensitive to unbalanced datasets. This paper proposes a distribution-sensitive prior on a standard statistical learning, i.e., Relevance Vector Machine (RVM), to deal with the imbalanced data problem. This prior analyzes the training dataset before learning a model. Thus, the RVM can put more weight on the samples from under-represented classes, while allows overall samples from the dataset to have a balanced impact to the learning process. Our experiment uses a publicly available gesture datasets, the Microsoft Research Cambridge-12 (MSRC-12). Experimental results show the importance of adapting to the unbalanced data and improving the recognition performance through distribution-sensitive prior.

[1]  Wei Xiong,et al.  Fuzzy relevance vector machine for learning from unbalanced data and noise , 2008, Pattern Recognit. Lett..

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  David Mease,et al.  Boosted Classification Trees and Class Probability/Quantile Estimation , 2007, J. Mach. Learn. Res..

[4]  P. Gupta,et al.  Relevance Vector Machine for Optical Diagnosis of Cancer , 2022 .

[5]  Qunsheng Peng,et al.  Online robust action recognition based on a hierarchical model , 2013, The Visual Computer.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Begüm Demir,et al.  Hyperspectral Image Classification Using Relevance Vector Machines , 2007, IEEE Geoscience and Remote Sensing Letters.

[8]  Hyunsook Chung,et al.  Conditional random field-based gesture recognition with depth information , 2013 .

[9]  Seong-Whan Lee,et al.  Combination of manual and non-manual features for sign language recognition based on conditional random field and active appearance model , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[10]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[11]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[12]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[13]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[14]  Yale Song,et al.  Distribution-sensitive learning for imbalanced datasets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[15]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[16]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[17]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[18]  B. Triggs,et al.  3D human pose from silhouettes by relevance vector regression , 2004, CVPR 2004.

[19]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[20]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[21]  Kanad K. Biswas,et al.  Gesture recognition using Microsoft Kinect® , 2011, The 5th International Conference on Automation, Robotics and Applications.

[22]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[23]  I. Patras,et al.  Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[24]  Robert M. Nishikawa,et al.  Relevance vector machine for automatic detection of clustered microcalcifications , 2005, IEEE Transactions on Medical Imaging.

[25]  N.D. Georganas,et al.  Real-time Vision-based Hand Gesture Recognition Using Haar-like Features , 2007, 2007 IEEE Instrumentation & Measurement Technology Conference IMTC 2007.