A Hybrid CRF/HMM for One-Shot Gesture Learning

This chapter deals with the characterization and the recognition of human gestures in videos. We propose a global characterization of gestures that we call the Gesture Signature. The gesture signature describes the location, velocity, and orientation of the global motion of a gesture deduced from optical flows. The proposed hybrid CRF/HMM model combines the modelling ability of hidden Markov models and the discriminative ability of conditional random fields. We applied this hybrid system to the recognition of gesture in videos in the context of one-shot learning, where only one sample gesture per class is given to train the system. In this rather extreme context, the proposed framework achieves very interesting performance which suggests its application to other biometric recognition tasks.

[1]  Suchendra M. Bhandarkar,et al.  Integrated detection and tracking of multiple faces using particle filtering and optical flow-based elastic matching , 2009, Comput. Vis. Image Underst..

[2]  Flávio Bortolozzi,et al.  Segmentation and recognition of handwritten dates: an HMM-MLP hybrid approach , 2003, Document Analysis and Recognition.

[3]  Hervé Bourlard,et al.  Hybrid Neural Network/Hidden Markov Model Systems for Continuous Speech Recognition , 1993, Int. J. Pattern Recognit. Artif. Intell..

[4]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Gang Qian,et al.  A Hybrid HMM/DPA Adaptive Gesture Recognition Method , 2005, ISVC.

[6]  C Neidle,et al.  SignStream: A tool for linguistic and computer vision research on visual-gestural language data , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[7]  Ling Shao,et al.  One shot learning gesture recognition from RGBD images , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Isabelle Guyon,et al.  ChaLearn gesture challenge: Design and first results , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Dimitris N. Metaxas,et al.  A Framework for Recognizing the Simultaneous Aspects of American Sign Language , 2001, Comput. Vis. Image Underst..

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Gerhard Rigoll,et al.  Maximum mutual information neural networks for hybrid connectionist-HMM speech recognition systems , 1994, IEEE Trans. Speech Audio Process..

[12]  Joseph Picone,et al.  Hybrid SVM/HMM architectures for speech recognition , 2000, INTERSPEECH.

[13]  Karl-Friedrich Kraiss,et al.  Recent developments in visual sign language recognition , 2008, Universal Access in the Information Society.

[14]  Emmanuel Augustin,et al.  A neural network-hidden Markov model hybrid for cursive word recognition , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[15]  Gary R. Bradski,et al.  Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library , 2016 .

[16]  Alex Waibel,et al.  Continuous speech recognition using linked predictive neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[18]  Surendra Ranganath,et al.  Deciphering gestures with layered meanings and signer adaptation , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[19]  Ling Shao,et al.  One shot learning gesture recognition with Kinect sensor , 2012, ACM Multimedia.

[20]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[21]  Steve Austin,et al.  The forward-backward search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Alexander H. Waibel,et al.  Continuous Speech Recognition by Linked Predictive Neural Networks , 1990, NIPS.

[24]  Jakub Konecný,et al.  One-shot-learning gesture recognition using HOG-HOF features , 2014, J. Mach. Learn. Res..

[25]  Michel Gilloux,et al.  A hybrid radial basis function network/hidden Markov model handwritten word recognition system , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[26]  Andrea Corradini Real-Time Gesture Recognition by Means of Hybrid Recognizers , 2001, Gesture Workshop.

[27]  Marco Gori,et al.  A survey of hybrid ANN/HMM models for automatic speech recognition , 2001, Neurocomputing.

[28]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[29]  Harvey F. Silverman,et al.  Combining hidden Markov model and neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[30]  Finn Tore Johansen,et al.  A comparison of hybrid HMM architecture using global discriminating training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[31]  Thierry Artières,et al.  Hybrid HMM and HCRF model for sequence classification , 2011, ESANN.

[32]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[33]  Bernadette Dorizzi,et al.  Sentence recognition through hybrid neuro-Markovian modeling , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[34]  Yoshua Bengio,et al.  LeRec: A NN/HMM Hybrid for On-Line Handwriting Recognition , 1995, Neural Computation.

[35]  Mubarak Shah,et al.  Discovering Motion Primitives for Unsupervised Grouping and One-Shot Learning of Human Actions, Gestures, and Expressions , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Yann LeCun,et al.  Multi-Digit Recognition Using a Space Displacement Neural Network , 1991, NIPS.

[37]  Thierry Paquet,et al.  Continuous CRF with Multi-scale Quantization Feature Functions Application to Structure Extraction in Old Newspaper , 2011, 2011 International Conference on Document Analysis and Recognition.

[38]  George Zavaliagkos,et al.  A Hybrid Continuous Speech Recognition System Using Segmental Neural Nets with Hidden Markov Models , 1993, Int. J. Pattern Recognit. Artif. Intell..

[39]  Kenneth M. Sayre,et al.  Machine recognition of handwritten words: A project report , 1973, Pattern Recognit..

[40]  Simon Thomas,et al.  A deep HMM model for multiple keywords spotting in handwritten documents , 2014, Pattern Analysis and Applications.

[41]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.