Rhythm is a Dancer: Music-Driven Motion Synthesis With Global Structure

Synthesizing human motion with a global structure, such as a choreography, is a challenging task. Existing methods tend to concentrate on local smooth pose transitions and neglect the global context or the theme of the motion. In this work, we present a music-driven motion synthesis framework that generates long-term sequences of human motions which are synchronized with the input beats, and jointly form a global structure that respects a specific dance genre. In addition, our framework enables generation of diverse motions that are controlled by the content of the music, and not only by the beat. Our music-driven dance synthesis framework is a hierarchical system that consists of three levels: pose, motif, and choreography. The pose level consists of an LSTM component that generates temporally coherent sequences of poses. The motif level guides sets of consecutive poses to form a movement that belongs to a specific distribution using a novel motion perceptual-loss. And the choreography level selects the order of the performed movements and drives the system to follow the global structure of a dance genre. Our results demonstrate the effectiveness of our music-driven framework to generate natural and consistent movements on various dance types, having control over the content of the synthesized motions, and respecting the overall structure of the dance.

[1]  Jonas Beskow,et al.  MoGlow , 2019, ACM Trans. Graph..

[2]  Libin Liu,et al.  Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[3]  Minho Lee,et al.  Music similarity-based approach to generating dance motion sequence , 2012, Multimedia Tools and Applications.

[4]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[5]  P. Pasquier,et al.  GrooveNet : Real-Time Music-Driven Dance Movement Generation using Artificial Neural Networks , 2017 .

[6]  Z. Popovic,et al.  Near-optimal character animation with continuous control , 2007, ACM Trans. Graph..

[7]  Kyogu Lee,et al.  Automatic Choreography Generation with Convolutional Encoder-decoder Network , 2019, ISMIR.

[8]  M. A. Brubaker,et al.  Probabilistic Character Motion Synthesis using a Hierarchical Deep Latent Variable Model , 2020, Comput. Graph. Forum.

[9]  Li Su,et al.  Temporally Guided Music-to-Body-Movement Generation , 2020, ACM Multimedia.

[10]  Masataka Goto,et al.  Automated choreography synthesis using a Gaussian process leveraging consumer-generated dance motions , 2014, Advances in Computer Entertainment.

[11]  Weidong Geng,et al.  Example-Based Automatic Music-Driven Conventional Dance Motion Synthesis , 2012, IEEE Transactions on Visualization and Computer Graphics.

[12]  Matthew E. P. Davies,et al.  Evaluation of Audio Beat Tracking and Music Tempo Extraction Algorithms , 2007 .

[13]  Zhiyong Wang,et al.  Combining Recurrent Neural Networks and Adversarial Training for Human Motion Synthesis and Control , 2018, IEEE Transactions on Visualization and Computer Graphics.

[14]  David A. Ross,et al.  Learn to Dance with AIST++: Music Conditioned 3D Dance Generation , 2021, ArXiv.

[15]  Sergey Levine,et al.  Continuous character control with low-dimensional embeddings , 2012, ACM Trans. Graph..

[16]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[17]  Libin Liu,et al.  Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning , 2018, ACM Trans. Graph..

[18]  Susan Leigh Foster,et al.  Choreographing Empathy: Kinesthesia in Performance , 2010 .

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Lucas Kovar,et al.  Motion graphs , 2002, SIGGRAPH '08.

[21]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[22]  Maneesh Agrawala,et al.  Visual rhythm and beat , 2018, ACM Trans. Graph..

[23]  Congyi Wang,et al.  Music2Dance: DanceNet for Music-Driven Dance Generation , 2020, ACM Trans. Multim. Comput. Commun. Appl..

[24]  Taku Komura,et al.  A Recurrent Variational Autoencoder for Human Motion Synthesis , 2017, BMVC.

[25]  George Papagiannakis,et al.  Style-based motion analysis for dance composition , 2017, The Visual Computer.

[26]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[27]  Asako Soga,et al.  Body-part motion synthesis system for contemporary dance creation , 2016, SIGGRAPH Posters.

[28]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[29]  Huang Hu,et al.  Dance Revolution: Long Sequence Dance Generation with Music via Curriculum Learning , 2020, ArXiv.

[30]  Michiel van de Panne,et al.  Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[31]  Andrew Zisserman,et al.  Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Dario Pavllo,et al.  QuaterNet: A Quaternion-based Recurrent Model for Human Motion , 2018, BMVC.

[33]  Daniel Cohen-Or,et al.  Deep motifs and motion signatures , 2018, ACM Trans. Graph..

[34]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kyoungmin Lee,et al.  Scalable muscle-actuated human simulation and control , 2019, ACM Trans. Graph..

[36]  D. Hyland Dance and the Lived Body: A Descriptive Aesthetics , 1987 .

[37]  C. Karen Liu,et al.  Animating human dressing , 2015, ACM Trans. Graph..

[38]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[39]  Luiz Velho,et al.  ChoreoGraphics: an authoring environment for dance shows , 2011, SIGGRAPH '11.

[40]  Yee-Hong Yang,et al.  Music-driven character animation , 2009, TOMCCAP.

[41]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[42]  Jessica K. Hodgins,et al.  Construction and optimal search of interpolated motion graphs , 2007, ACM Trans. Graph..

[43]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[44]  J. Hodgins,et al.  Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning , 2017, ACM Trans. Graph..

[45]  Hubert P. H. Shum,et al.  DanceDJ: A 3D Dance Animation Authoring System for Live Performance , 2017, ACE.

[46]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  José M. F. Moura,et al.  Adversarial Geometry-Aware Human Motion Prediction , 2018, ECCV.

[48]  Naoshi Kaneko,et al.  Analyzing Input and Output Representations for Speech-Driven Gesture Generation , 2019, IVA.

[49]  Yi-Hsuan Yang,et al.  Machine Recognition of Music Emotion: A Review , 2012, TIST.

[50]  Yisong Yue,et al.  A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..

[51]  Zhanxing Zhu,et al.  Spatio-Temporal Manifold Learning for Human Motions via Long-Horizon Modeling , 2019, IEEE Transactions on Visualization and Computer Graphics.

[52]  KangKang Yin,et al.  SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[53]  Ira Kemelmacher-Shlizerman,et al.  Audio to Body Dynamics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  J. Paul Robinson,et al.  Towards 3D Dance Motion Synthesis and Control , 2020, ArXiv.

[55]  Yanxi Liu,et al.  Dancing with Turks , 2015, ACM Multimedia.

[56]  Sanja Fidler,et al.  Learning to Generate Diverse Dance Motions with Transformer , 2020, ArXiv.

[57]  Nicolas Pronost,et al.  Interactive Character Animation Using Simulated Physics: A State‐of‐the‐Art Review , 2012, Comput. Graph. Forum.

[58]  Yi Zhou,et al.  Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis , 2017, ICLR.

[59]  Taku Komura,et al.  Mode-adaptive neural networks for quadruped motion control , 2018, ACM Trans. Graph..

[60]  Qifeng Chen,et al.  Self-supervised Dance Video Synthesis Conditioned on Music , 2020, ACM Multimedia.

[61]  Michiel van de Panne,et al.  Character controllers using motion VAEs , 2020, ACM Trans. Graph..

[62]  Atsushi Nakazawa,et al.  Dancing‐to‐Music Character Animation , 2006, Comput. Graph. Forum.

[63]  Daniel P. W. Ellis,et al.  Beat Tracking by Dynamic Programming , 2007 .

[64]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  C. Karen Liu,et al.  Online control of simulated humanoids using particle belief propagation , 2015, ACM Trans. Graph..

[66]  Sehoon Ha,et al.  Falling and landing motion control for character animation , 2012, ACM Trans. Graph..

[67]  Wei Chen,et al.  ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit , 2020, ACM Multimedia.

[68]  Subhransu Maji,et al.  Visemenet , 2018, ACM Trans. Graph..

[69]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[70]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[71]  Jia Jia,et al.  Dance with Melody: An LSTM-autoencoder Approach to Music-oriented Dance Synthesis , 2018, ACM Multimedia.

[72]  A. Murat Tekalp,et al.  Learn2Dance: Learning Statistical Music-to-Dance Mappings for Choreography Synthesis , 2012, IEEE Transactions on Multimedia.

[73]  Jaakko Lehtinen,et al.  Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..

[74]  C. Karen Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, ACM Trans. Graph..

[75]  Jonas Beskow,et al.  Style‐Controllable Speech‐Driven Gesture Synthesis Using Normalising Flows , 2020, Comput. Graph. Forum.

[76]  Sung Yong Shin,et al.  Rhythmic-motion synthesis based on motion-beat analysis , 2003, ACM Trans. Graph..

[77]  Taku Komura,et al.  Interaction patches for multi-character animation , 2008, ACM Trans. Graph..

[78]  Songhwai Oh,et al.  Generative Autoregressive Networks for 3D Dancing Move Synthesis From Music , 2019, IEEE Robotics and Automation Letters.

[79]  Eduardo de Campos Valadares,et al.  Dancing to the music , 2000 .

[80]  Sebastian Starke,et al.  Local motion phases for learning multi-contact character movements , 2020, ACM Trans. Graph..

[81]  Jehee Lee,et al.  Interactive character animation by learning multi-objective control , 2018, ACM Trans. Graph..

[82]  Daniel Cohen-Or,et al.  Dance to the beat: Synchronizing motion to audio , 2018, Computational Visual Media.

[83]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[84]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.