论文信息 - Music-Driven Dance Generation

Music-Driven Dance Generation

In this paper, a novel model for synthesizing dance movements from music/audio sequence is proposed, which has variety of potential applications, e.g. virtual reality. For a given unheard song, in order to generate musically meaningful and natural dance movements, the following criteria should be met: 1) the rhythm between the dance action and music beat should be harmonious; 2) the generated dance movements should have notable and natural variations. Specifically, a sequence to sequence (Seq2Seq) learning architecture that leverages Long Short-Term Memory (LSTM) and Self-Attention mechanism (SA) is proposed for dance generation. The work in this article is interesting in the following aspects: 1) A cross-domain Seq2Seq learning framework is proposed for realistic dance generation; 2) A set of evaluation criterion is proposed for synthetization evaluation which do not have source for reference; 3) A dance dataset that including both music and corresponding dance motions collected, and very competitive results have been obtained against the-state-of-the-arts.

[1] Rico Sennrich,et al. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[2] A. Murat Tekalp,et al. Digital Video Processing , 1995 .

[3] Jagannath H. Nirmal,et al. A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).

[4] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[5] Abderrahim Beni Hssane,et al. Feature extraction of some Quranic recitation using Mel-Frequency Cepstral Coeficients (MFCC) , 2016, 2016 5th International Conference on Multimedia Computing and Systems (ICMCS).

[6] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[7] Geoffroy Peeters,et al. A set of audio features for the morphological description of vocal imitations , 2015 .

[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[9] Yike Guo,et al. Semantic Image Synthesis via Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10] Luka Crnkovic-Friis,et al. Generative Choreography using Deep Learning , 2016, ICCC.

[11] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[12] Peter J. Lang,et al. A Bio‐Informational Theory of Emotional Imagery , 1979 .

[13] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Wang Wenbo,et al. Feature extraction of underwater target in auditory sensation area based on MFCC , 2016, 2016 IEEE/OES China Ocean Acoustics (COA).

[15] Jiebo Luo,et al. Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[16] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.

[17] Aaron Hertzmann,et al. Style machines , 2000, SIGGRAPH 2000.

[18] Tetsuya Ogata,et al. DELAYED SKIP CONNECTIONS FOR MUSIC CONTENT DRIVEN MOTION GENERATION , 2018 .

[19] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[20] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[21] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[22] Alexei A. Efros,et al. Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Yue Gao,et al. Exploring Principles-of-Art Features For Image Emotion Recognition , 2014, ACM Multimedia.

[24] P. Pasquier,et al. GrooveNet : Real-Time Music-Driven Dance Movement Generation using Artificial Neural Networks , 2017 .

[25] Xing Xie,et al. Image Inspired Poetry Generation in XiaoIce , 2018, ArXiv.

[26] Chris Donahue,et al. Dance Dance Convolution , 2017, ICML.

[27] James R. Glass,et al. Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[28] György Fazekas,et al. An Ontology for Audio Features , 2016, ISMIR.

[29] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[30] Jürgen Schmidhuber,et al. A Clockwork RNN , 2014, ICML.

[31] Harry Shum,et al. Motion texture: a two-level statistical model for character motion synthesis , 2002, ACM Trans. Graph..

[32] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[33] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[34] Weidong Geng,et al. Example-Based Automatic Music-Driven Conventional Dance Motion Synthesis , 2012, IEEE Transactions on Visualization and Computer Graphics.

[35] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[36] Chi-Keung Tang,et al. Deep Video Generation, Prediction and Completion of Human Action Sequences , 2017, ECCV.

[37] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[38] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[39] A. Murat Tekalp,et al. An audio-driven dancing avatar , 2008, Journal on Multimodal User Interfaces.

[40] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[41] Yann Dauphin,et al. A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.

[42] Mirella Lapata,et al. Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[43] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[44] Jiebo Luo,et al. Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[45] M. Bradley,et al. Emotion, Motivation, and Anxiety: Brain Mechanisms and Psychophysiology the Motivational Organization of Emotion Patterns of Human Emotion Emotion and Perception the Psychophysiology of Picture Processing Neural Imaging: Motivation in the Visual Cortex Motivational Circuits in the Brain , 2022 .

[46] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.