MG-VAE: Deep Chinese Folk Songs Generation with Specific Regional Style

Regional style in Chinese folk songs is a rich treasure that can be used for ethnic music creation and folk culture research. In this paper, we propose MG-VAE, a music generative model based on VAE (Variational Auto-Encoder) that is capable of capturing specific music style and generating novel tunes for Chinese folk songs (Min Ge) in a manipulatable way. Specifically, we disentangle the latent space of VAE into four parts in an adversarial training way to control the information of pitch and rhythm sequence, as well as of music style and content. In detail, two classifiers are used to separate style and content latent space, and temporal supervision is utilized to disentangle the pitch and rhythm sequence. The experimental results show that the disentanglement is successful and our model is able to create novel folk songs with controllable regional styles. To our best knowledge, this is the first study on applying deep generative model and adversarial training for Chinese music generation.

[1]  Garrison W. Cottrell,et al.  DeepJ: Style-Specific Music Generation , 2018, 2018 IEEE 12th International Conference on Semantic Computing (ICSC).

[2]  Robert M. Keller,et al.  Learning to Create Jazz Melodies Using a Product of Experts , 2017, ICCC.

[3]  Zhiyao Duan,et al.  Part-invariant Model for Music Generation and Harmonization , 2018, ISMIR.

[4]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[5]  Gus Xia,et al.  Music Style Transfer Issues: A Position Paper , 2018, ArXiv.

[6]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ming Zhou,et al.  Neural Melody Composition from Lyrics , 2018, NLPCC.

[9]  Wei-Hua Chieng,et al.  Analyzing the perception of Chinese melodic imagery and its application to automated composition , 2015, Multimedia Tools and Applications.

[10]  Colin Raffel,et al.  A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music , 2018, ICML.

[11]  Ching-Hua Chuan,et al.  A Functional Taxonomy of Music Generation Systems , 2017, ACM Comput. Surv..

[12]  Dongyan Zhao,et al.  Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[13]  Kentaro Shibata,et al.  Unsupervised Melody Style Conversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Douglas Eck,et al.  This time with feeling: learning expressive musical performance , 2018, Neural Computing and Applications.

[15]  Dongyang Li,et al.  Algorithm composition of Chinese folk music based on swarm intelligence , 2017, Int. J. Comput. Sci. Math..

[16]  Yi-Hsuan Yang,et al.  Learning Disentangled Representations for Timber and Pitch in Music Audio , 2018, ArXiv.

[17]  Lili Mou,et al.  Disentangled Representation Learning for Non-Parallel Text Style Transfer , 2018, ACL.

[18]  Li Su,et al.  Transferring the Style of Homophonic Music Using Recurrent Neural Networks and Autoregressive Model , 2018, ISMIR.

[19]  Olga Vechtomova,et al.  Disentangled Representation Learning for Text Style Transfer , 2018, ArXiv.

[20]  Nicholas Jing Yuan,et al.  XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music , 2018, KDD.

[21]  Katia Kermanidis,et al.  Conditional neural sequence learners for generating drums’ rhythms , 2018, Neural Computing and Applications.

[22]  Yi-Hsuan Yang,et al.  MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment , 2017, AAAI.

[23]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[24]  Yi-Hsuan Yang,et al.  MidiNet: A Convolutional Generative Adversarial Network for Symbolic-Domain Music Generation , 2017, ISMIR.

[25]  Douglas Eck,et al.  An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation , 2018, ArXiv.

[26]  Han Kuo-Huang,et al.  Folk Songs of the Han Chinese : Characteristics and Classifications in Chinese Music Theory. , 1989 .

[27]  Bob L. Sturm,et al.  Music transcription modelling and composition using deep learning , 2016, ArXiv.

[28]  Gaëtan Hadjeres,et al.  Deep Learning Techniques for Music Generation - A Survey , 2017, ArXiv.

[29]  Juan Li,et al.  Regional classification of Chinese folk songs based on CRF model , 2018, Multimedia Tools and Applications.