Dance with Melody: An LSTM-autoencoder Approach to Music-oriented Dance Synthesis

Dance is greatly influenced by music. Studies on how to synthesize music-oriented dance choreography can promote research in many fields, such as dance teaching and human behavior research. Although considerable effort has been directed toward investigating the relationship between music and dance, the synthesis of appropriate dance choreography based on music remains an open problem. There are two main challenges: 1) how to choose appropriate dance figures, i.e., groups of steps that are named and specified in technical dance manuals, in accordance with music and 2) how to artistically enhance choreography in accordance with music. To solve these problems, in this paper, we propose a music-oriented dance choreography synthesis method using a long short-term memory (LSTM)-autoencoder model to extract a mapping between acoustic and motion features. Moreover, we improve our model with temporal indexes and a masking method to achieve better performance. Because of the lack of data available for model training, we constructed a music-dance dataset containing choreographies for four types of dance, totaling 907,200 frames of 3D dance motions and accompanying music, and extracted multidimensional features for model training. We employed this dataset to train and optimize the proposed models and conducted several qualitative and quantitative experiments to select the best-fitted model. Finally, our model proved to be effective and efficient in synthesizing valid choreographies that are also capable of musical expression.

[1]  J. Piaget,et al.  The principles of genetic epistemology , 1972 .

[2]  M. Cardle,et al.  Music-driven motion editing: local motion transformations guided by music analysis , 2002, Proceedings 20th Eurographics UK Conference.

[3]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Xiaoyan Zhang,et al.  Impact of Music on Comprehensive Quality of Students in Sports Dance Teaching , 2013 .

[6]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[7]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[8]  Tido Röder,et al.  Documentation Mocap Database HDM05 , 2007 .

[9]  Peter Grosche,et al.  Cyclic tempogram—A mid-level tempo representation for musicsignals , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  P. Pasquier,et al.  GrooveNet : Real-Time Music-Driven Dance Movement Generation using Artificial Neural Networks , 2017 .

[11]  Shinji Watanabe,et al.  Weakly-Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[12]  James K. Hahn,et al.  Making Them Dance , 2006, AAAI Fall Symposium: Aurally Informed Performance.

[13]  Katsu Yamane,et al.  Retrieval and Generation of Human Motions Based on Associative Model between Motion Symbols and Motion Labels , 2010 .

[14]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[15]  Atsushi Nakazawa,et al.  Dancing‐to‐Music Character Animation , 2006, Comput. Graph. Forum.

[16]  Alexander Berman,et al.  Kinetic Imaginations: Exploring the Possibilities of Combining AI and Dance , 2015, IJCAI.

[17]  Tetsuya Ogata,et al.  Sequential Deep Learning for Dancing Motion Generation , 2016 .

[18]  Salvatore Gaglio,et al.  An automatic system for humanoid dance creation , 2016, BICA 2016.

[19]  Allana C. Lindgren "Rethinking Automatist Interdisciplinarity: The Relationship between Dance and Music in the Early Choreographic Works of Jeanne Renaud and Françoise Sullivan, 1948-1950" , 2011 .

[20]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[21]  F. Rauscher,et al.  Listening to Mozart enhances spatial-temporal reasoning: towards a neurophysiological basis , 1995, Neuroscience Letters.

[22]  Daniel P. W. Ellis,et al.  Beat Tracking by Dynamic Programming , 2007 .

[23]  Eric Feron,et al.  Modeling musically meaningful choreography , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[24]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[25]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  G. Widmer,et al.  MAXIMUM FILTER VIBRATO SUPPRESSION FOR ONSET DETECTION , 2013 .

[28]  Hubert P. H. Shum,et al.  Automatic dance generation system considering sign language information , 2016, SIGGRAPH Posters.

[29]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..