Dynamic Time Warping for Music Conducting Gestures Evaluation

Musical performance by an ensemble of performers often requires a conductor. This paper presents a tool to aid the study of basic conducting gestures, also known as meter- mimicking gestures, performed by beginners. It is based on the automatic detection of musical metrics and their subdivisions by analysis of hand gestures. Musical metrics are represented by visual conducting patterns performed by hands, which are tracked using an RGB-D camera. These patterns are recognized and evaluated using a probabilistic framework based on dynamic time warping (DTW). There are two main contributions in this work. Firstly, a new metric is proposed for the DTW, allowing better alignment between two gesture movements without the use of explicit maxima local points. Secondly, the time precision of the conducting gesture is extracted directly from the warping path and its accuracy is evaluated by a confidence measure. Experimental results indicate that the classification scheme represents an improvement over other existing related approaches.

[1]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2]  Chafic Mokbel,et al.  Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Norbert Schnell,et al.  Continuous Realtime Gesture Following and Recognition , 2009, Gesture Workshop.

[4]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.

[5]  Debanjan Mitra,et al.  Likelihood Inference Based on Left Truncated and Right Censored Data From a Gamma Distribution , 2013, IEEE Transactions on Reliability.

[6]  Todd Ingalls,et al.  Real-time Gesture Recognition with Minimal Training Requirements and On-line Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[8]  Jeen-Shing Wang,et al.  An Accelerometer-Based Digital Pen With a Trajectory Recognition Algorithm for Handwritten Digit and Gesture Recognition , 2012, IEEE Transactions on Industrial Electronics.

[9]  Michael J. Hove,et al.  The perception of prototypical motion: synchronization is enhanced with quantitatively morphed gestures of musical conductors. , 2012, Journal of experimental psychology. Human perception and performance.

[10]  Yi-Shin Chen,et al.  An interactive conducting system using Kinect , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[11]  Fernando De la Torre,et al.  Generalized time warping for multi-modal alignment of human motion , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Yi-Shin Chen,et al.  Interacting with a Music Conducting System , 2009, HCI.

[13]  Luke Dahl Triggering Sounds from Discrete Air Gestures: What Movement Feature Has the Best Timing? , 2014, NIME.

[14]  D. Giles,et al.  Bias of the Maximum Likelihood Estimators of the Two-Parameter Gamma Distribution Revisited , 2009 .

[15]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[16]  Yuri Ivanov,et al.  The UBS Virtual Maestro: an Interactive Conducting System , 2009, NIME.

[17]  Claus Bahlmann,et al.  The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[19]  Tim Ellis,et al.  Recognizing hand gesture using Fourier descriptors , 2004, ICPR 2004.

[20]  S. M. A. Hussain,et al.  User independent hand gesture recognition by accelerated DTW , 2012, 2012 International Conference on Informatics, Electronics & Vision (ICIEV).

[21]  P. Kolesnik Conducting Gesture Recognition, Analysis and Performance System , 2004 .

[22]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[23]  Marcel J. T. Reinders,et al.  Sign Language Recognition by Combining Statistical DTW and Independent Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[25]  Marc Leman,et al.  The “Conducting Master”: An Interactive, Real-Time Gesture Monitoring System Based on Spatiotemporal Motion Templates , 2013, Int. J. Hum. Comput. Interact..

[26]  Chris P. Tsokos,et al.  Mathematical Statistics with Applications , 2009 .

[27]  Ehud Rivlin,et al.  Offline Loop Investigation for Handwriting Analysis , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[29]  David J. Fleet,et al.  Human attributes from 3D pose tracking , 2010, Comput. Vis. Image Underst..

[30]  Shahrokh Valaee,et al.  A Novel Accelerometer-based Gesture Recognition System by , 2010 .

[31]  Ho-Sub Yoon,et al.  Hand gesture recognition using hidden Markov models , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[32]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[33]  Yang Zhang,et al.  A robust Dynamic Time Warping algorithm for batch trajectory synchronization , 2008, 2008 American Control Conference.

[34]  Christian Ritz,et al.  Motion classification using Dynamic Time Warping , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[35]  Jan O. Borchers,et al.  conga: A Framework for Adaptive Conducting Gesture Analysis , 2006, NIME.

[36]  Álvaro Sarasúa,et al.  Dynamics in Music Conducting: A Computational Comparative Study Among Subjects , 2014, NIME.

[37]  Max Mühlhäuser,et al.  Personal Orchestra: conducting audio/video music recordings , 2002, Second International Conference on Web Delivering of Music, 2002. WEDELMUSIC 2002. Proceedings..

[38]  Marc Leman,et al.  Communicating expressiveness and affect in multimodal interactive systems , 2005, IEEE MultiMedia.

[39]  Yaokai Feng,et al.  Non-Markovian dynamic time warping , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[40]  Seong-Ju Kim,et al.  The metrically trimmed mean as a robust estimator of location , 1992 .

[41]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Antonio Rodà,et al.  PHYSICAL MOVEMENT AND MUSICAL GESTURES : A MULTILEVEL MAPPING STRATEGY , 2005 .

[43]  G. Luck,et al.  Ensemble Musicians’ Synchronization With Conductors’ Gestures: An Automated Feature-Extraction Analysis , 2006 .

[44]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[45]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[46]  Tapio Takala,et al.  Conductor Following With Artificial Neural Networks , 1999, ICMC.