Bayesian Co-Boosting for Multi-modal Gesture Recognition

With the development of data acquisition equipment, more and more modalities become available for gesture recognition. However, there still exist two critical issues for multimodal gesture recognition: how to select discriminative features for recognition and how to fuse features from different modalities. In this paper, we propose a novel Bayesian Co-Boosting framework for multi-modal gesture recognition. Inspired by boosting learning and co-training method, our proposed framework combines multiple collaboratively trained weak classifiers to construct the final strong classifier for the recognition task. During each iteration round, we randomly sample a number of feature subsets and estimate weak classifier's parameters for each subset. The optimal weak classifier and its corresponding feature subset are retained for strong classifier construction. Furthermore, we define an upper bound of training error and derive the update rule of instance's weight, which guarantees the error upper bound to be minimized through iterations. For demonstration, we present an implementation of our framework using hidden Markov models as weak classifiers. We perform extensive experiments using the ChaLearn MMGR and ChAirGest data sets, in which our approach achieves 97.63% and 96.53% accuracy respectively on each publicly available data set.

[1]  Yang Li,et al.  PCA & HMM Based Arm Gesture Recognition Using Inertial Measurement Unit , 2013, BODYNETS.

[2]  Yi Yao,et al.  Boosting for transfer learning with multiple sources , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Immanuel Bayer,et al.  A multi modal approach to gesture recognition from audio and video data , 2013, ICMI '13.

[4]  Tapio Seppänen,et al.  Hand gesture recognition of a mobile device user , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[5]  Ming C. Leu,et al.  American Sign Language word recognition with a sensory glove using artificial neural networks , 2011, Eng. Appl. Artif. Intell..

[6]  Jagdish Lal Raheja,et al.  Real-Time Robotic Hand Control Using Hand Gestures , 2010, 2010 Second International Conference on Machine Learning and Computing.

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[8]  Alexander Vezhnevets,et al.  Avoiding Boosting Overfitting by Removing Confusing Samples , 2007, ECML.

[9]  Elena Mugellini,et al.  ChAirGest: a challenge for multimodal mid-air gesture recognition for close HCI , 2013, ICMI '13.

[10]  Zihan Zhou,et al.  Towards a practical face recognition system: Robust registration and illumination by sparse representation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[12]  Isabelle Guyon,et al.  ChaLearn gesture challenge: Design and first results , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  张国亮,et al.  Comparison of Different Implementations of MFCC , 2001 .

[14]  Yui Man Lui,et al.  A least squares regression framework on manifolds and its application to gesture recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Liang Dong,et al.  Recognition of visual speech elements using adaptively boosted hidden Markov models , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  M. Kavakli,et al.  A robust gesture recognition algorithm based on Sparse Representation, random projections and Compressed Sensing , 2012, 2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA).

[17]  Gerhard Rigoll,et al.  Hidden Markov model based continuous online gesture recognition , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[18]  Ayoub Al-Hamadi,et al.  A Hidden Markov Model-based continuous gesture recognition system for hand motion trajectory , 2008, 2008 19th International Conference on Pattern Recognition.

[19]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[20]  Sergio Escalera,et al.  Multi-modal gesture recognition challenge 2013: dataset and results , 2013, ICMI '13.

[21]  J. Ross Beveridge,et al.  Action classification on product manifolds , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[24]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[25]  Thad Starner,et al.  American sign language recognition with the kinect , 2011, ICMI '11.

[26]  Mohammed Yeasin,et al.  Visual understanding of dynamic hand gestures , 2000, Pattern Recognit..

[27]  Timo Pylvänäinen,et al.  Accelerometer Based Gesture Recognition Using Continuous HMMs , 2005, IbPRIA.

[28]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[29]  Marco Roccetti,et al.  A fast and robust gesture recognition system for exhibit gaming scenarios , 2011, SimuTools.

[30]  Yui Man Lui,et al.  Human gesture recognition on product manifolds , 2012, J. Mach. Learn. Res..

[31]  Bharti Bansal,et al.  Gesture Recognition: A Survey , 2016 .

[32]  Zhen Wang,et al.  uWave: Accelerometer-based Personalized Gesture Recognition and Its Applications , 2009, PerCom.

[33]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[34]  Zheng Fang,et al.  Comparison of different implementations of MFCC , 2001 .

[35]  Hanqing Lu,et al.  Fusing multi-modal features for gesture recognition , 2013, ICMI '13.

[36]  Kanad K. Biswas,et al.  Gesture recognition using Microsoft Kinect® , 2011, The 5th International Conference on Automation, Robotics and Applications.

[37]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[38]  Cyrus Shahabi,et al.  Feature subset selection and feature ranking for multivariate time series , 2005, IEEE Transactions on Knowledge and Data Engineering.

[39]  Robin R. Murphy,et al.  Hand gesture recognition with depth images: A review , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[40]  Kenneth Tze Kin Teo,et al.  Comparison study of Hidden Markov Model gesture recognition using fixed state and variable state , 2013, 2013 IEEE International Conference on Signal and Image Processing Applications.

[41]  Wei-Yun Yau,et al.  A multi-modal gesture recognition system using audio, video, and skeletal joint data , 2013, ICMI '13.

[42]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[43]  Rangasami L. Kashyap,et al.  Optimal feature selection and decision rules in classification problems with time series , 1978, IEEE Trans. Inf. Theory.

[44]  Ying Yin,et al.  Gesture spotting and recognition using salience detection and concatenated hidden markov models , 2013, ICMI '13.

[45]  F. Mörchen Time series feature extraction for data mining using DWT and DFT , 2003 .

[46]  Géraldine Damnati,et al.  Robust speech/non-speech detection using LDA applied to MFCC , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[47]  Markus Koskela,et al.  Online RGB-D gesture recognition with extreme learning machines , 2013, ICMI '13.

[48]  R. Bharat Rao,et al.  Bayesian Co-Training , 2007, J. Mach. Learn. Res..

[49]  Narendra Ahuja,et al.  Recognizing hand gesture using motion trajectories , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[50]  Venu Govindaraju,et al.  A temporal Bayesian model for classifying, detecting and localizing activities in video sequences , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[51]  Ling Shao,et al.  One shot learning gesture recognition from RGBD images , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[52]  Svetha Venkatesh,et al.  Hierarchical recognition of intentional human gestures for sports video annotation , 2002, Object recognition supported by user interaction for service robots.

[53]  Wen Gao,et al.  Recognition of sign language subwords based on boosted hidden Markov models , 2005, ICMI '05.

[54]  Michael G. Strintzis,et al.  A gesture recognition system using 3D data , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.