Music-Driven Dance Generation

In this paper, a novel model for synthesizing dance movements from music/audio sequence is proposed, which has variety of potential applications, e.g. virtual reality. For a given unheard song, in order to generate musically meaningful and natural dance movements, the following criteria should be met: 1) the rhythm between the dance action and music beat should be harmonious; 2) the generated dance movements should have notable and natural variations. Specifically, a sequence to sequence (Seq2Seq) learning architecture that leverages Long Short-Term Memory (LSTM) and Self-Attention mechanism (SA) is proposed for dance generation. The work in this article is interesting in the following aspects: 1) A cross-domain Seq2Seq learning framework is proposed for realistic dance generation; 2) A set of evaluation criterion is proposed for synthetization evaluation which do not have source for reference; 3) A dance dataset that including both music and corresponding dance motions collected, and very competitive results have been obtained against the-state-of-the-arts.

[1]  Rico Sennrich,et al.  Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[2]  A. Murat Tekalp,et al.  Digital Video Processing , 1995 .

[3]  Jagannath H. Nirmal,et al.  A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).

[4]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[5]  Abderrahim Beni Hssane,et al.  Feature extraction of some Quranic recitation using Mel-Frequency Cepstral Coeficients (MFCC) , 2016, 2016 5th International Conference on Multimedia Computing and Systems (ICMCS).

[6]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[7]  Geoffroy Peeters,et al.  A set of audio features for the morphological description of vocal imitations , 2015 .

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Yike Guo,et al.  Semantic Image Synthesis via Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Luka Crnkovic-Friis,et al.  Generative Choreography using Deep Learning , 2016, ICCC.

[11]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[12]  Peter J. Lang,et al.  A Bio‐Informational Theory of Emotional Imagery , 1979 .

[13]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Wang Wenbo,et al.  Feature extraction of underwater target in auditory sensation area based on MFCC , 2016, 2016 IEEE/OES China Ocean Acoustics (COA).

[15]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[16]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[17]  Aaron Hertzmann,et al.  Style machines , 2000, SIGGRAPH 2000.

[18]  Tetsuya Ogata,et al.  DELAYED SKIP CONNECTIONS FOR MUSIC CONTENT DRIVEN MOTION GENERATION , 2018 .

[19]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[20]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Yue Gao,et al.  Exploring Principles-of-Art Features For Image Emotion Recognition , 2014, ACM Multimedia.

[24]  P. Pasquier,et al.  GrooveNet : Real-Time Music-Driven Dance Movement Generation using Artificial Neural Networks , 2017 .

[25]  Xing Xie,et al.  Image Inspired Poetry Generation in XiaoIce , 2018, ArXiv.

[26]  Chris Donahue,et al.  Dance Dance Convolution , 2017, ICML.

[27]  James R. Glass,et al.  Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[28]  György Fazekas,et al.  An Ontology for Audio Features , 2016, ISMIR.

[29]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[30]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[31]  Harry Shum,et al.  Motion texture: a two-level statistical model for character motion synthesis , 2002, ACM Trans. Graph..

[32]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[33]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[34]  Weidong Geng,et al.  Example-Based Automatic Music-Driven Conventional Dance Motion Synthesis , 2012, IEEE Transactions on Visualization and Computer Graphics.

[35]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[36]  Chi-Keung Tang,et al.  Deep Video Generation, Prediction and Completion of Human Action Sequences , 2017, ECCV.

[37]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[38]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[39]  A. Murat Tekalp,et al.  An audio-driven dancing avatar , 2008, Journal on Multimodal User Interfaces.

[40]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[41]  Yann Dauphin,et al.  A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.

[42]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[43]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[44]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[45]  M. Bradley,et al.  Emotion, Motivation, and Anxiety: Brain Mechanisms and Psychophysiology the Motivational Organization of Emotion Patterns of Human Emotion Emotion and Perception the Psychophysiology of Picture Processing Neural Imaging: Motivation in the Visual Cortex Motivational Circuits in the Brain , 2022 .

[46]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.