A GAN Model With Self-attention Mechanism To Generate Multi-instruments Symbolic Music

GAN has recently been proved to be able to generate symbolic music in the form of piano-rolls. However, those existing GAN-based multi-track music generation methods are always unstable. Moreover, due to defects in the temporal features extraction, the generated multi-track music does not sound natural enough. Therefore, we propose a new GAN model with self-attention mechanism, DMB-GAN, which can extract more temporal features of music to generate multi-instruments music stably. First of all, to generate more consistent and natural single-track music, we introduce self-attention mechanism to enable GAN-based music generation model to extract not only spatial features but also temporal features. Secondly, to generate multi-instruments music with harmonic structure among all tracks, we construct a dual generative adversarial architecture with multi-branches, each branch for one track. Finally, to improve generated quality of multi-instruments symbolic music, we introduce switchable normalization to stabilize network training. The experimental results show that DMB-GAN can stably generate coherent, natural multi-instruments music with good quality.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Olof Mogren,et al.  C-RNN-GAN: Continuous recurrent neural networks with adversarial training , 2016, ArXiv.

[3]  Yi-Hsuan Yang,et al.  MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment , 2017, AAAI.

[4]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[5]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[6]  Bob L. Sturm,et al.  Music transcription modelling and composition using deep learning , 2016, ArXiv.

[7]  Yi-Hsuan Yang,et al.  Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation , 2018, ISMIR.

[8]  Yi-Hsuan Yang,et al.  MidiNet: A Convolutional Generative Adversarial Network for Symbolic-Domain Music Generation , 2017, ISMIR.

[9]  Frank Nielsen,et al.  GLSR-VAE: Geodesic latent space regularization for variational autoencoder architectures , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[10]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[11]  Ping Luo,et al.  Differentiable Learning-to-Normalize via Switchable Normalization , 2018, ICLR.

[12]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[13]  Roger Wattenhofer,et al.  Symbolic Music Genre Transfer with CycleGAN , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[14]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[15]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[16]  Roger Wattenhofer,et al.  MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer , 2018, ISMIR.

[17]  Colin Raffel,et al.  Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching , 2016 .

[18]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  Brink van der Merwe,et al.  Music Generation with Markov Models , 2011, IEEE MultiMedia.

[21]  J. Schmidhuber,et al.  A First Look at Music Composition using LSTM Recurrent Neural Networks , 2002 .

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Douglas Eck,et al.  Tuning Recurrent Neural Networks with Reinforcement Learning , 2016, ICLR.