Unsupervised Extraction of Human-Interpretable Nonverbal Behavioral Cues in a Public Speaking Scenario

We present a framework for unsupervised detection of nonverbal behavioral cues---hand gestures, pose, body movements, etc.---from a collection of motion capture (MoCap) sequences in a public speaking setting. We extract the cues by solving a sparse and shift-invariant dictionary learning problem, known as shift-invariant sparse coding. We find that the extracted behavioral cues are human-interpretable in the context of public speaking. Our technique can be applied to automatically identify the common patterns of body movements and the time-instances of their occurrences, minimizing time and efforts needed for manual detection and coding of nonverbal human behaviors.

[1]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[2]  Jessica K. Hodgins,et al.  Aligned Cluster Analysis for temporal segmentation of human motion , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[3]  Guangchun Cheng,et al.  Advances in Human Action Recognition: A Survey , 2015, ArXiv.

[4]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[5]  Dimitris N. Metaxas,et al.  A review of motion analysis methods for human Nonverbal Communication Computing , 2013, Image Vis. Comput..

[6]  Daniel Jurafsky,et al.  It’s Not You, it’s Me: Detecting Flirting and its Misperception in Speed-Dates , 2009, EMNLP.

[7]  Yi Li,et al.  Learning shift-invariant sparse representation of actions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Mikkel N. Schmidt,et al.  Shift Invariant Sparse Coding of Image and Music Data , 2007 .

[9]  Mohammed E. Hoque,et al.  Rhema: A Real-Time In-Situ Intelligent Interface to Help People with Public Speaking , 2015, IUI.

[10]  Daniel Gildea,et al.  Automated prediction and analysis of job interview performance: The role of what you say and how you say it , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[11]  Louis-Philippe Morency,et al.  Cicero - Towards a Multimodal Virtual Audience Platform for Public Speaking Training , 2013, IVA.

[12]  M. Knapp,et al.  Nonverbal communication in human interaction , 1972 .

[13]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[14]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..