论文信息 - A method for dance motion recognition and scoring using two-layer classifier based on conditional random field and stochastic error-correcting context-free grammar

A method for dance motion recognition and scoring using two-layer classifier based on conditional random field and stochastic error-correcting context-free grammar

This paper presents a unified framework for recognizing and scoring dance motion using 2-layer classifier so that computation complexity is distributed into two layers. This research examines the performance of sliding window, hidden Markov Model (HMM) and conditional random field (CRF) as the first layer classifier to segment the input video into a sequence of motion primitive label. The second layer classifier is stochastic error-correcting context-free grammar, built based on dance master knowledge, to parse the sequence of labels, builds a parse tree, and computes the accumulated dance score. The dataset for this research is captured using one Kinect camera. The training dataset is: 212 samples of 12 motion primitive samples and seven videos of Pendet dance performance. From 5-fold cross-validation, accuracy of sliding window, HMM, and CRF are 0.63, 0.79, and 0.86 respectively. This result shows that CRF achieves higher performance as a dance motion primitive recognizer than HMM as proposed by [1]. The CRF model achieves 0.88 of accuracy when motion feature is all skeleton joint angular coordinates as proposed by [2] but increases to 0.93 if the motion feature is only upper-body joint coordinates. Stochastic error-correcting context-free grammar is chosen as dance choreography model. The experiment using synthetic sequence label with cost factor ci=1 and error-sequence labels up to 50 percent shows the grammar can tolerate the input label sequence error up to 25 percent. The experiment using Pendet dance performances show that the average dance score is 79.3. The low dance score is due to several factors including: dance skill variation, unstable basic gesture repetition, high cost contributed by replacing deletion and substitution of local error by insertion operation, duration variation due the absence of timing guideline of body part motions, and limited training dataset to capture possible basic gesture variations.

[1] Darko Kirovski,et al. Real-time classification of dance gestures from skeleton animation , 2011, SCA '11.

[2] Tadao Kasami,et al. An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[3] King-sun Fu,et al. Stochastic Error-Correcting Syntax Analysis for Recognition of Noisy Patterns , 1977, IEEE Transactions on Computers.

[4] John Cocke,et al. Programming languages and their compilers , 1969 .

[5] Tom Fawcett,et al. ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[6] King-Sun Fu,et al. Stochastic Error-Correcting Syntax Analysis for Recognition of Noisy Patterns , 1977, IEEE Trans. Computers.

[7] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8] Aaron F. Bobick,et al. Action recognition using probabilistic parsing , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[9] Jean Carletta,et al. Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[10] Daniel H. Younger,et al. Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[11] Aaron F. Bobick,et al. Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..