Simultaneous categorical and spatio-temporal 3D gestures using Kinect

Recent technological advances have led to an increasing popularity of 3D gesture-based interfaces, in particular in gaming and entertainment consoles. However, unlike 2D gestures, which have been successfully utilized in many multi-touch devices, developing a 3D gesture-based interface is not an easy endeavor. Reasons include the complexity of capturing human movements in 3D and the difficulties associated with recognizing gestures from human motion data. In this work, we target the latter problem by proposing a novel gesture recognition technique for skeletal input data that simultaneously allows for categorical and spatio-temporal gestures. In other words, it recognizes the gesture type and the relative pose within a gesture at the same time. Moreover, our method can learn gestures that are most appropriate for the user from examples. In order to avoid the need for user-specific training, we further propose and evaluate several types of feature representations for human pose data. We argue how our approach can facilitate the development of a customizable 3D gesture-based interface and explore possibilities in order to smoothly integrate the proposed recognition approach into available component-based user interface frameworks. Besides a quantitative evaluation, we present a user study in the scenario of a 3D gesture-based interface for an intra-operative medical image viewer. Our studies support the applicability of our method for developing 3D gesture-based interfaces in practice.

[1]  Domenec Puig,et al.  Real-time body gesture recognition using depth camera , 2011 .

[2]  Yang Li,et al.  Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes , 2007, UIST.

[3]  R. Wachter The end of the beginning: patient safety five years after 'to err is human'. , 2004, Health affairs.

[4]  Ulrich G. Hofmann,et al.  Touch- and marker-free interaction with medical software , 2009 .

[5]  P. Gray,et al.  A Demonstration of the OpenInterface Interaction Development Environment , 2007 .

[6]  Terry Winograd,et al.  FlowMenu: combining command, text, and data entry , 2000, UIST '00.

[7]  Marc Hassenzahl,et al.  The importance of a software's pragmatic quality depends on usage modes , 2002 .

[8]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[9]  Kenton O'Hara,et al.  Exploring the potential for touchless interaction in image-guided interventional radiology , 2011, CHI.

[10]  Daniel Stonier,et al.  A New Approach for Human-Robot Interaction Using Human Body Language , 2011, ICHIT.

[11]  Emmanuel Barillot,et al.  Control menus: excecution and control in a single interactor , 2000, CHI Extended Abstracts.

[12]  Daniel Thalmann,et al.  Natural activation for gesture recognition systems , 2011, CHI EA '11.

[13]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Mario Ciampi,et al.  Controller-free exploration of medical image data: Experiencing the Kinect , 2011, 2011 24th International Symposium on Computer-Based Medical Systems (CBMS).

[15]  Patrick Baudisch,et al.  Imaginary interfaces: spatial interaction with empty hands and without visual feedback , 2010, UIST.

[16]  Aung Sithu Kyaw,et al.  A Gesture-Based Interface and Active Cinema , 2011, ACII.

[17]  Du-Sik Park,et al.  3D remote interface for smart displays , 2011, CHI EA '11.

[18]  Luigi Gallo,et al.  A Glove-Based Interface for 3D Medical Image Visualization , 2010 .

[19]  Ricardo Gutierrez-Osuna,et al.  Web GIS in practice X: a Microsoft Kinect natural user interface for Google Earth navigation , 2011, International journal of health geographics.

[20]  Prospero C. Naval,et al.  REALISM: Real-Time Hand Gesture Interface for Surgeons and Medical Experts , 2009 .

[21]  Marcelo Knörich Zuffo,et al.  On the usability of gesture interfaces in virtual reality environments , 2005, CLIHC '05.

[22]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Michael Burmester,et al.  AttrakDiff: Ein Fragebogen zur Messung wahrgenommener hedonischer und pragmatischer Qualität , 2003, MuC.

[24]  Nassir Navab,et al.  Learning Gestures for Customizable Human-Computer Interaction in the Operating Room , 2011, MICCAI.

[25]  Greg Welch,et al.  Welch & Bishop , An Introduction to the Kalman Filter 2 1 The Discrete Kalman Filter In 1960 , 1994 .