Improving Automatic Jazz Melody Generation by Transfer Learning Techniques

In this paper, we tackle the problem of transfer learning for Jazz automatic generation. Jazz is one of representative types of music, but the lack of Jazz data in the MIDI format hinders the construction of a generative model for Jazz. Transfer learning is an approach aiming to solve the problem of data insufficiency, so as to transfer the common feature from one domain to another. In view of its success in other machine learning problems, we investigate whether, and how much, it can help improve automatic music generation for under-resourced musical genres. Specifically, we use a recurrent variational autoencoder as the generative model, and use a genre-unspecified dataset as the source dataset and a Jazz-only dataset as the target dataset. Two transfer learning methods are evaluated using six levels of source-to-target data ratios. The first method is to train the model on the source dataset, and then fine-tune the resulting model parameters on the target dataset. The second method is to train the model on both the source and target datasets at the same time, but add genre labels to the latent vectors and use a genre classifier to improve Jazz generation. Our subjective evaluation shows that both methods outperform the baseline method that uses Jazz data only for training by a large margin. Among the two methods, the first method seems to perform better. Our evaluation also shows the limits of existing objective metrics in evaluating the performance of music generation models.

[1]  Sanja Fidler,et al.  Song From PI: A Musically Plausible Network for Pop Music Generation , 2016, ICLR.

[2]  R. Keller,et al.  JazzGAN : Improvising with Generative Adversarial Networks , 2018 .

[3]  Mark Sandler,et al.  Transfer Learning for Music Classification and Regression Tasks , 2017, ISMIR.

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  Frank Nielsen,et al.  DeepBach: a Steerable Model for Bach Chorales Generation , 2016, ICML.

[7]  Li Su,et al.  Learning Domain-Adaptive Latent Representations of Music Signals Using Variational Autoencoders , 2018, ISMIR.

[8]  Yi-Hsuan Yang,et al.  MidiNet: A Convolutional Generative Adversarial Network for Symbolic-Domain Music Generation , 2017, ISMIR.

[9]  Hao-Min Liu,et al.  LEAD SHEET GENERATION AND ARRANGEMENT VIA A HYBRID GENERATIVE MODEL , 2018 .

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  Jose D. Fernández,et al.  AI Methods in Algorithmic Composition: A Comprehensive Survey , 2013, J. Artif. Intell. Res..

[12]  Juhan Nam,et al.  Representation Learning of Music Using Artist Labels , 2018, ISMIR.

[13]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[14]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[15]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Matthew E. P. Davies,et al.  Transfer Learning In Mir: Sharing Learned Latent Representations For Music Audio Classification And Similarity , 2013, ISMIR.

[17]  Yi-Hsuan Yang,et al.  MuseGAN: Symbolic-domain Music Generation and Accompaniment with Multi-track Sequential Generative Adversarial Networks , 2017, ArXiv.

[18]  Majid Mirbagheri,et al.  ASR for Under-Resourced Languages From Probabilistic Transcription , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Li Su,et al.  Vocal Melody Extraction with Semantic Segmentation and Audio-symbolic Domain Transfer Learning , 2018, ISMIR.

[20]  Colin Raffel,et al.  A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music , 2018, ICML.

[21]  Gavriel Salomon,et al.  T RANSFER OF LEARNING , 1992 .

[22]  Alexander Lerch,et al.  On the evaluation of generative models in music , 2018, Neural Computing and Applications.

[23]  Bob L. Sturm,et al.  Music transcription modelling and composition using deep learning , 2016, ArXiv.

[24]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[25]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.