ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit

Dance and music are two highly correlated artistic forms. Synthesizing dance motions has attracted much attention recently. Most previous works conduct music-to-dance synthesis via directly music to human skeleton keypoints mapping. Meanwhile, human choreographers design dance motions from music in a two-stage manner: they firstly devise multiple choreographic dance units (CAUs), each with a series of dance motions, and then arrange the CAU sequence according to the rhythm, melody and emotion of the music. Inspired by these, we systematically study such two-stage choreography approach and construct a dataset to incorporate such choreography knowledge. Based on the constructed dataset, we design a two-stage music-to-dance synthesis framework ChoreoNet to imitate human choreography procedure. Our framework firstly devises a CAU prediction model to learn the mapping relationship between music and CAU sequences. Afterwards, we devise a spatial-temporal inpainting model to convert the CAU sequence into continuous dance motions. Experimental results demonstrate that the proposed ChoreoNet outperforms baseline methods (0.622 in terms of CAU BLEU score and 1.59 in terms of user study score).

[1]  Markus Schedl,et al.  ENHANCED BEAT TRACKING WITH CONTEXT-AWARE NEURAL NETWORKS , 2011 .

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[5]  Otmar Hilliges,et al.  Structured Prediction Helps 3D Human Motion Modelling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Adrian Hilton,et al.  Realistic synthesis of novel human movements from a database of motion capture examples , 2000, Proceedings Workshop on Human Motion.

[7]  José M. F. Moura,et al.  Adversarial Geometry-Aware Human Motion Prediction , 2018, ECCV.

[8]  Congyi Wang,et al.  Music2Dance: DanceNet for Music-Driven Dance Generation , 2020, ACM Trans. Multim. Comput. Commun. Appl..

[9]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[10]  Shinji Watanabe,et al.  Weakly-Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[11]  Eduardo de Campos Valadares,et al.  Dancing to the music , 2000 .

[12]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[13]  Dario Pavllo,et al.  QuaterNet: A Quaternion-based Recurrent Model for Human Motion , 2018, BMVC.

[14]  Jia Jia,et al.  Dance with Melody: An LSTM-autoencoder Approach to Music-oriented Dance Synthesis , 2018, ACM Multimedia.

[15]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[16]  Björn W. Schuller,et al.  Universal Onset Detection with Bidirectional Long Short-Term Memory Neural Networks , 2010, ISMIR.

[17]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Gerhard Widmer,et al.  Feature Learning for Chord Recognition: The Deep Chroma Extractor , 2016, ISMIR.

[19]  M. Cardle,et al.  Music-driven motion editing: local motion transformations guided by music analysis , 2002, Proceedings 20th Eurographics UK Conference.

[20]  Kyogu Lee,et al.  Listen to Dance: Music-driven choreography generation using Autoregressive Encoder-Decoder Network , 2018, ArXiv.

[21]  Minho Lee,et al.  Music similarity-based approach to generating dance motion sequence , 2012, Multimedia Tools and Applications.

[22]  Sebastian Nowozin,et al.  Efficient Nonlinear Markov Models for Human Motion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Francesc Moreno-Noguer,et al.  Human Motion Prediction via Spatio-Temporal Inpainting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  David A. Forsyth,et al.  Motion synthesis from annotations , 2003, ACM Trans. Graph..

[25]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Otmar Hilliges,et al.  Learning Human Motion Models for Long-Term Predictions , 2017, 2017 International Conference on 3D Vision (3DV).

[27]  Luka Crnkovic-Friis,et al.  Generative Choreography using Deep Learning , 2016, ICCC.

[28]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[29]  Haoran Li,et al.  Music-oriented Dance Video Synthesis with Pose Perceptual Loss , 2019, ArXiv.

[30]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[31]  Florian Krebs,et al.  madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.