论文信息 - A GAN Model With Self-attention Mechanism To Generate Multi-instruments Symbolic Music

A GAN Model With Self-attention Mechanism To Generate Multi-instruments Symbolic Music

GAN has recently been proved to be able to generate symbolic music in the form of piano-rolls. However, those existing GAN-based multi-track music generation methods are always unstable. Moreover, due to defects in the temporal features extraction, the generated multi-track music does not sound natural enough. Therefore, we propose a new GAN model with self-attention mechanism, DMB-GAN, which can extract more temporal features of music to generate multi-instruments music stably. First of all, to generate more consistent and natural single-track music, we introduce self-attention mechanism to enable GAN-based music generation model to extract not only spatial features but also temporal features. Secondly, to generate multi-instruments music with harmonic structure among all tracks, we construct a dual generative adversarial architecture with multi-branches, each branch for one track. Finally, to improve generated quality of multi-instruments symbolic music, we introduce switchable normalization to stabilize network training. The experimental results show that DMB-GAN can stably generate coherent, natural multi-instruments music with good quality.

[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2] Olof Mogren,et al. C-RNN-GAN: Continuous recurrent neural networks with adversarial training , 2016, ArXiv.

[3] Yi-Hsuan Yang,et al. MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment , 2017, AAAI.

[4] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[5] Alán Aspuru-Guzik,et al. Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[6] Bob L. Sturm,et al. Music transcription modelling and composition using deep learning , 2016, ArXiv.

[7] Yi-Hsuan Yang,et al. Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation , 2018, ISMIR.

[8] Yi-Hsuan Yang,et al. MidiNet: A Convolutional Generative Adversarial Network for Symbolic-Domain Music Generation , 2017, ISMIR.

[9] Frank Nielsen,et al. GLSR-VAE: Geodesic latent space regularization for variational autoencoder architectures , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[10] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.

[11] Ping Luo,et al. Differentiable Learning-to-Normalize via Switchable Normalization , 2018, ICLR.

[12] Thierry Bertin-Mahieux,et al. The Million Song Dataset , 2011, ISMIR.

[13] Roger Wattenhofer,et al. Symbolic Music Genre Transfer with CycleGAN , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[14] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[15] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[16] Roger Wattenhofer,et al. MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer , 2018, ISMIR.

[17] Colin Raffel,et al. Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching , 2016 .

[18] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[19] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[20] Brink van der Merwe,et al. Music Generation with Markov Models , 2011, IEEE MultiMedia.

[21] J. Schmidhuber,et al. A First Look at Music Composition using LSTM Recurrent Neural Networks , 2002 .

[22] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23] Douglas Eck,et al. Tuning Recurrent Neural Networks with Reinforcement Learning , 2016, ICLR.