Listen to Dance: Music-driven choreography generation using Autoregressive Encoder-Decoder Network

Automatic choreography generation is a challenging task because it often requires an understanding of two abstract concepts - music and dance - which are realized in the two different modalities, namely audio and video, respectively. In this paper, we propose a music-driven choreography generation system using an auto-regressive encoder-decoder network. To this end, we first collect a set of multimedia clips that include both music and corresponding dance motion. We then extract the joint coordinates of the dancer from video and the mel-spectrogram of music from audio, and train our network using music-choreography pairs as input. Finally, a novel dance motion is generated at the inference time when only music is given as an input. We performed a user study for a qualitative evaluation of the proposed method, and the results show that the proposed model is able to generate musically meaningful and natural dance movements given an unheard song.

[1]  Minho Lee,et al.  Music similarity-based approach to generating dance motion sequence , 2012, Multimedia Tools and Applications.

[2]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[3]  A. Murat Tekalp,et al.  An audio-driven dancing avatar , 2008, Journal on Multimodal User Interfaces.

[4]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[5]  Eric Feron,et al.  Modeling musically meaningful choreography , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[6]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[7]  C. Krumhansl,et al.  Can Dance Reflect the Structural and Expressive Qualities of Music? A Perceptual Experiment on Balanchine's Choreography of Mozart's Divertimento No. 15 , 1997 .

[8]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Hideyuki Tachibana,et al.  Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  P. Pasquier,et al.  GrooveNet : Real-Time Music-Driven Dance Movement Generation using Artificial Neural Networks , 2017 .