论文信息 - Real-time continuous gesture recognition for natural multimodal interaction

Real-time continuous gesture recognition for natural multimodal interaction

I have developed a real-time continuous gesture recognition system capable of dealing with two important problems that have previously been neglected: (a) smoothly handling two different kinds of gestures: those characterized by distinct paths and those characterized by distinct hand poses; and (b) determining how and when the system should respond to gestures. The novel approaches in this thesis include: a probabilistic recognition framework based on a flattened hierarchical hidden Markov model (HHMM) that unifies the recognition of path and pose gestures; and a method of using information from the hidden states in the HMM to identify different gesture phases (the pre-stroke, the nucleus and the post-stroke phases), allowing the system to respond appropriately to both gestures that require a discrete response and those needing a continuous response. The system is extensible: new gestures can be added by recording 3-6 repetitions of the gesture; the system will train an HMM model for the gesture and integrate it into the existing HMM, in a process that takes only a few minutes. Our evaluation shows that even using only a small number of training examples (e.g. 6), the system can achieve an average F1 score of 0.805 for two forms of gestures. To evaluate the performance of my system I collected a new dataset (YANG dataset) that includes both path and pose gestures, offering a combination currently lacking in the community and providing the challenge of recognizing different types of gestures mixed together. I also developed a novel hybrid evaluation metric that is more relevant to realtime interaction with different gesture flows. Thesis Supervisor: Randall Davis Title: Professor

Ying Yin | Ying Yin

[1] Eric C. Larson,et al. HeatWave: thermal imaging for surface user interaction , 2011, CHI.

[2] Anbumani Subramanian,et al. Dynamic Hand Pose Recognition Using Depth Data , 2010, 2010 20th International Conference on Pattern Recognition.

[3] Michael I. Mandel,et al. Visual Hand Tracking Using Nonparametric Belief Propagation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[4] Adrian Kaehler,et al. Learning opencv, 1st edition , 2008 .

[5] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Gang Qian,et al. Online Gesture Spotting from Visual Hull Data , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Trevor Darrell,et al. Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Jock D. Mackinlay,et al. The information visualizer, an information workspace , 1991, CHI.

[9] Shumin Zhai,et al. Making touchscreen keyboards adaptive to keys, hand postures, and individuals: a hierarchical spatial backoff model approach , 2013, CHI.

[10] Andrew McCallum,et al. An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[11] Hermann Hienz,et al. Relevant features for video-based continuous sign language recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[12] Yoichi Sato,et al. Real-Time Fingertip Tracking and Gesture Recognition , 2002, IEEE Computer Graphics and Applications.

[13] Jonathon Shlens,et al. A Tutorial on Principal Component Analysis , 2014, ArXiv.

[14] Dmitry B. Goldgof,et al. Gesture recognition using Bezier curves for visualization navigation from registered 3-D data , 2004, Pattern Recognit..

[15] Cordelia Schmid,et al. Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[16] Alvaro Marcos-Ramiro,et al. Body communicative cue extraction for conversational analysis , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[17] Philip R. Cohen,et al. QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[18] Sabrina Dicintio. Comparing Approaches to Initializing the Expectation-Maximization Algorithm , 2012 .

[19] Chih-Jen Lin,et al. A Practical Guide to Support Vector Classication , 2008 .

[20] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .

[21] Rajeev Sharma,et al. Exploiting speech/gesture co-occurrence for improving continuous gesture recognition in weather narration , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[22] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .

[23] Vladimir Pavlovic,et al. Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[24] Baptiste Caramiaux,et al. Realtime Segmentation and Recognition of Gestures Using Hierarchical Markov Models , 2022 .

[25] Rajeev Sharma,et al. Designing a human-centered, multimodal GIS interface to support emergency management , 2002, GIS '02.

[26] Elena Mugellini,et al. ChAirGest: a challenge for multimodal mid-air gesture recognition for close HCI , 2013, ICMI '13.

[27] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28] Dan Rosenfeld,et al. Going beyond the display: a surface technology with an electronically switchable diffuser , 2008, UIST '08.

[29] Sylvain Paris,et al. 6D hands: markerless hand-tracking for computer aided design , 2011, UIST.

[30] Chris Harrison,et al. OmniTouch: wearable multitouch interaction everywhere , 2011, UIST.

[31] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32] William T. Freeman,et al. Orientation Histograms for Hand Gesture Recognition , 1995 .

[33] Paul Lukowicz,et al. Performance metrics for activity recognition , 2011, TIST.

[34] Antonella De Angeli,et al. Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[35] Adrian E. Raftery,et al. MCLUST Version 3: An R Package for Normal Mixture Modeling and Model-Based Clustering , 2006 .

[36] Saul Greenberg,et al. Multimodal multiplayer tabletop gaming , 2007, CIE.

[37] Sidney S. Fels,et al. ForTouch: A Wearable Digital Ventriloquized Actor , 2009, NIME.

[38] Donald J. Berndt,et al. Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[39] Seong-Whan Lee,et al. Gesture Spotting and Recognition for Human–Robot Interaction , 2007, IEEE Transactions on Robotics.

[40] Ying Yin,et al. Toward natural interaction in the real world: real-time gesture recognition , 2010, ICMI-MLMI '10.

[41] Steve Young,et al. The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[42] Sergio Escalera,et al. Multi-modal gesture recognition challenge 2013: dataset and results , 2013, ICMI '13.

[43] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[44] Yale Song,et al. Action Recognition by Hierarchical Sequence Summarization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Jason Weston,et al. A user's guide to support vector machines. , 2010, Methods in molecular biology.

[46] Thad Starner,et al. Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[47] Richard A. Bolt,et al. “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[48] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[49] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[50] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[51] R. Davis. Toward an Intelligent Multimodal Interface for Natural Interaction , 2009 .

[52] Zhengyou Zhang,et al. Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[53] Isabelle Guyon,et al. The ChaLearn gesture dataset (CGD 2011) , 2014, Machine Vision and Applications.

[54] Yale Song,et al. Multi-signal gesture recognition using temporal smoothing hidden conditional random fields , 2011, Face and Gesture 2011.

[55] Pattie Maes,et al. SixthSense: a wearable gestural interface , 2009, SIGGRAPH ASIA Art Gallery & Emerging Technologies.

[56] Meredith Ringel Morris,et al. User-defined gestures for surface computing , 2009, CHI.

[57] Mohammed Waleed Kadous,et al. Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[58] Gary Bradski,et al. Computer Vision Face Tracking For Use in a Perceptual User Interface , 1998 .

[59] Yale Song,et al. Continuous body and hand gesture recognition for natural human-computer interaction , 2012, TIIS.

[60] Ying Yin,et al. Gesture spotting and recognition using salience detection and concatenated hidden markov models , 2013, ICMI '13.

[61] Rajat Raina,et al. Efficient sparse coding algorithms , 2006, NIPS.

[62] Yale Song,et al. Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database , 2011, Face and Gesture 2011.

[63] Jovan Popović,et al. Real-time hand-tracking with a color glove , 2009, SIGGRAPH 2009.

[64] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[65] Trevor Darrell,et al. Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[66] Antonis A. Argyros,et al. Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[67] Kourosh Khoshelham,et al. Accuracy analysis of kinect depth data , 2012 .

[68] Beth Levy,et al. Conceptual Representations in Lan-guage Activity and Gesture , 1980 .

[69] Matthew Turk,et al. View-based interpretation of real-time optical flow for gesture recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.