论文信息 - Listen to Dance: Music-driven choreography generation using Autoregressive Encoder-Decoder Network

Listen to Dance: Music-driven choreography generation using Autoregressive Encoder-Decoder Network

Automatic choreography generation is a challenging task because it often requires an understanding of two abstract concepts - music and dance - which are realized in the two different modalities, namely audio and video, respectively. In this paper, we propose a music-driven choreography generation system using an auto-regressive encoder-decoder network. To this end, we first collect a set of multimedia clips that include both music and corresponding dance motion. We then extract the joint coordinates of the dancer from video and the mel-spectrogram of music from audio, and train our network using music-choreography pairs as input. Finally, a novel dance motion is generated at the inference time when only music is given as an input. We performed a user study for a qualitative evaluation of the proposed method, and the results show that the proposed model is able to generate musically meaningful and natural dance movements given an unheard song.

[1] Minho Lee,et al. Music similarity-based approach to generating dance motion sequence , 2012, Multimedia Tools and Applications.

[2] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[3] A. Murat Tekalp,et al. An audio-driven dancing avatar , 2008, Journal on Multimodal User Interfaces.

[4] Geoffrey E. Hinton,et al. Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[5] Eric Feron,et al. Modeling musically meaningful choreography , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[6] Jürgen Schmidhuber,et al. Highway Networks , 2015, ArXiv.

[7] C. Krumhansl,et al. Can Dance Reflect the Structural and Expressive Qualities of Music? A Perceptual Experiment on Balanchine's Choreography of Mozart's Divertimento No. 15 , 1997 .

[8] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10] Hideyuki Tachibana,et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] P. Pasquier,et al. GrooveNet : Real-Time Music-Driven Dance Movement Generation using Artificial Neural Networks , 2017 .