Beyond Features for Recognition: Human-Readable Measures to Understand Users’ Whole-Body Gesture Performance

ABSTRACT Understanding users’ whole-body gesture performance quantitatively requires numerical gesture descriptors or features. However, the vast majority of gesture features that have been proposed in the literature were specifically designed for machines to recognize gestures accurately, which makes those features exclusively machine-readable. The complexity of such features makes it difficult for user interface designers, non-experts in machine learning, to understand and use them effectively (see, for instance, the Hu moment statistics or the Histogram of Gradients features), which reduces considerably designers’ available options to describe users’ whole-body gesture performance with legible and easily interpretable numerical measures. To address this problem, we introduce in this work a set of 17 measures that user interface practitioners can readily employ to characterize users’ whole-body gesture performance with human-readable concepts, such as area, volume, or quantity. Our measures describe (1) spatial characteristics of body movement, (2) kinematic performance, and (3) body posture appearance for whole-body gestures. We evaluate our measures on a public dataset composed of 5,654 gestures collected from 30 participants, for which we report several gesture findings, e.g., participants performed body gestures in an average volume of space of 1.0 m3, with an average amount of hands movement of 14.6 m, and a maximum body posture diffusion of 5.8 m. We show the relationship between our gesture measures and recognition rates delivered by a template-based Nearest-Neighbor whole-body gesture classifier implementing the Dynamic Time Warping dissimilarity function. We also release BOGArT, the Body Gesture Analysis Toolkit, that automatically computes our measures. This work will empower researchers and practitioners with new numerical tools to reach a better understanding of how users perform whole-body gestures and thus, to use this knowledge to inform improved designs of whole-body gesture user interfaces.

[1]  M. Abbie,et al.  Movement notation. , 1974, The Australian journal of physiotherapy.

[2]  Meredith Ringel Morris,et al.  Web on the wall: insights from a multimodal interaction elicitation study , 2012, ITS.

[3]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Gary R. Bradski,et al.  Motion segmentation and pose recognition with motion history gradients , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[5]  Radu-Daniel Vatavu,et al.  Formalizing Agreement Analysis for Elicitation Studies: New Measures, Significance Test, and Toolkit , 2015, CHI.

[6]  Meredith Ringel Morris,et al.  User-defined gestures for surface computing , 2009, CHI.

[7]  R. Sukthankar,et al.  Space-Time Shapelets for Action Recognition , 2008, 2008 IEEE Workshop on Motion and video Computing.

[8]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Worawat Choensawat,et al.  Applications for Recording and Generating Human Body Motion with Labanotation , 2014, Dance Notations and Robot Motion.

[10]  Guillaume Doisy,et al.  Position-invariant, real-time gesture recognition based on dynamic time warping , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[11]  Md. Atiqur Rahman Ahad,et al.  Motion recognition approach to solve overwriting in complex actions , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[12]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[13]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[14]  Yang Li,et al.  Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes , 2007, UIST.

[15]  Gary R. Bradski,et al.  Motion segmentation and pose recognition with motion history gradients , 2002, Machine Vision and Applications.

[16]  Peter E. Keller,et al.  Captured by motion: Dance, action understanding, and social cognition , 2011, Brain and Cognition.

[17]  Andreas Schrader,et al.  Capturing and analysing movement using depth sensors and Labanotation , 2015, EICS.

[18]  Radu-Daniel Vatavu,et al.  Between-Subjects Elicitation Studies: Formalization and Tool Support , 2016, CHI.

[19]  Lisa Anthony,et al.  A lightweight multistroke recognizer for user interface prototypes , 2010, Graphics Interface.

[20]  Liu Laiyang,et al.  Fingers' movement analysis based on Labanotation , 2014, 2014 IEEE Workshop on Advanced Research and Technology in Industry Applications (WARTIA).

[21]  Radu-Daniel Vatavu,et al.  User-defined gestures for free-hand TV control , 2012, EuroITV.

[22]  Radu-Daniel Vatavu A comparative study of user-defined handheld vs. freehand gestures for home entertainment environments , 2013, J. Ambient Intell. Smart Environ..

[23]  Francis K. H. Quek,et al.  Hand Motion Oscillatory Gestures and Multimodal Discourse Analysis , 2006, Int. J. Hum. Comput. Interact..

[24]  Catherine J. Stevens,et al.  Dynamic dance warping: Using dynamic time warping to compare dance movement performed under different conditions , 2014, MOCO.

[25]  Woontack Woo,et al.  A 3D Vision-Based Ambient User Interface , 2006, Int. J. Hum. Comput. Interact..

[26]  A F Bobick,et al.  Movement, activity and action: the role of knowledge in the perception of motion. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[27]  Thomas B. Moeslund,et al.  A Procedure for Developing Intuitive and Ergonomic Gesture Interfaces for HCI , 2003, Gesture Workshop.

[28]  Yutaka Ohno,et al.  A system for the representation of human body movement from dance scores , 1987, Pattern Recognit. Lett..

[29]  David Ott,et al.  Web on the Wall Reloaded: Implementation, Replication and Refinement of User-Defined Interaction Sets , 2014, ITS '14.

[30]  Wei-Tek Tsai,et al.  Personalized gesture interactions for cyber-physical smart-home environments , 2015, Science China Information Sciences.

[31]  Brad A. Myers,et al.  Maximizing the guessability of symbolic input , 2005, CHI Extended Abstracts.

[32]  Marc Leman,et al.  The “Conducting Master”: An Interactive, Real-Time Gesture Monitoring System Based on Spatiotemporal Motion Templates , 2013, Int. J. Hum. Comput. Interact..

[33]  O Karhu,et al.  Correcting working postures in industry: A practical method for analysis. , 1977, Applied ergonomics.

[34]  Xiangshi Ren,et al.  Jump and shoot!: prioritizing primary and alternative body gestures for intense gameplay , 2014, CHI.

[35]  Toni Robertson,et al.  Design representations of moving bodies for interactive, motion-sensing spaces , 2009, Int. J. Hum. Comput. Stud..

[36]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[37]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[38]  Radu-Daniel Vatavu,et al.  Gestures as point clouds: a $P recognizer for user interface prototypes , 2012, ICMI '12.

[39]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[40]  Patrick Baudisch,et al.  Imaginary phone: learning imaginary interfaces by transferring spatial memory from a familiar device , 2011, UIST.

[41]  Zhenjiang Miao,et al.  Automatic Labanotation Generation Based on Human Motion Capture Data , 2014, CCPR.

[42]  James W. Davis Hierarchical motion history images for recognizing human motion , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[43]  Aspasia Dania,et al.  Labankido©: The Evaluation of a Multimedia Tool Designed for the Teaching of Basic Skills and Concepts in Dance Education. , 2013 .

[44]  Patrick Baudisch,et al.  Imaginary interfaces: spatial interaction with empty hands and without visual feedback , 2010, UIST.

[45]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[46]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Andrew Wilson,et al.  Data miming: inferring spatial object descriptions from human gesture , 2011, CHI.

[48]  Ronen Basri,et al.  Shape representation and classification using the Poisson equation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[49]  Florian Müller,et al.  PalmRC: imaginary palm-based remote control for eyes-free television interaction , 2012, EuroITV.

[50]  Daniel Vogel,et al.  Estimating the Perceived Difficulty of Pen Gestures , 2011, INTERACT.

[51]  Adam Kendon,et al.  Language and Gesture: Language and gesture: unity or duality? , 2000 .

[52]  Radu-Daniel Vatavu,et al.  On free-hand TV control: experimental results on user-elicited gestures with Leap Motion , 2015, Personal and Ubiquitous Computing.

[53]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[54]  Randal C. Nelson,et al.  Detection and Recognition of Periodic, Nonrigid Motion , 1997, International Journal of Computer Vision.

[55]  Patrick Baudisch,et al.  Imaginary reality basketball: a ball game without a ball , 2014, CHI Extended Abstracts.

[56]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[57]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[58]  Yang Gao,et al.  Multi-layered gesture recognition with Kinect , 2015, J. Mach. Learn. Res..

[59]  Nicholas R. Howe,et al.  Silhouette Lookup for Automatic Pose Tracking , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[60]  Radu-Daniel Vatavu,et al.  Leap gestures for TV: insights from an elicitation study , 2014, TVX.

[61]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[62]  Liang Wang,et al.  Informative Shape Representations for Human Action Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[63]  Laurent Grisoni,et al.  Understanding Users' Perceived Difficulty of Multi-Touch Gesture Articulation , 2014, ICMI.

[64]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[65]  Helena Wulff,et al.  Dance, Anthropology of , 2018, The International Encyclopedia of Anthropology.

[66]  Anne Marie Piper,et al.  A Wizard-of-Oz elicitation study examining child-defined gestures with a whole-body interface , 2013, IDC.

[67]  Kate Grim-Feinberg,et al.  Labanotation and the Study of Human Movement in Anthropology , 2015 .

[68]  Bodo Rosenhahn,et al.  Human Motion: Understanding, Modelling, Capture, and Animation (Computational Imaging and Vision) , 2007 .

[69]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[70]  Manfred Tscheligi,et al.  Inspirations from honey bees: exploring movement measures for dynamic whole body gestures , 2013, ITS.

[71]  Radu-Daniel Vatavu,et al.  Audience Silhouettes: Peripheral Awareness of Synchronous Audience Kinesics for Social Television , 2015, TVX.

[72]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[73]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[74]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[75]  Radu-Daniel Vatavu,et al.  Nomadic gestures: A technique for reusing gesture commands for frequent ambient interactions , 2012, J. Ambient Intell. Smart Environ..

[76]  Hsuan-Sheng Chen,et al.  Human action recognition using star skeleton , 2006, VSSN '06.

[77]  Osama Masoud,et al.  A method for human action recognition , 2003, Image Vis. Comput..

[78]  V Z Priel,et al.  A Numerical Definition of Posture , 1974, Human factors.

[79]  O Karhu,et al.  Observing working postures in industry: Examples of OWAS application. , 1981, Applied ergonomics.

[80]  L. R. Rabiner,et al.  A comparative study of several dynamic time-warping algorithms for connected-word recognition , 1981, The Bell System Technical Journal.

[81]  Ann Hutchinson Guest,et al.  Labanotation : or, Kinetography Laban : the system of analyzing and recording movement , 1970 .

[82]  Yael Edan,et al.  Optimal Consensus Intuitive Hand Gesture Vocabulary Design , 2008, 2008 IEEE International Conference on Semantic Computing.

[83]  Radu-Daniel Vatavu,et al.  The impact of motion dimensionality and bit cardinality on the design of 3D gesture recognizers , 2013, Int. J. Hum. Comput. Stud..

[84]  M. Studdert-Kennedy Hand and Mind: What Gestures Reveal About Thought. , 1994 .

[85]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.