A Variant Model of TGAN for Music Generation

In the past five years, we have seen an increase in generative adversarial networks (GANs) and their applications for image generation. Due to the randomness and unpredictability of the structure of music, music generation is well suited to the use of GANs. Numerous studies have been published on music generation by using temporal GANs. However, few studies have focused on the relationships between melodies and chords, and the effects of latent space on time sequence. We also propose a new method to implement latent structure on GANs for music generation. The main innovation of the proposed model is the use of new discriminator to recognize the time sequence of music and use of a pretrained beat generator to improve the quality of patterned melodies and chords. Results indicated that the pretrained model improved the quality of generated music.

[1]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2]  Olof Mogren,et al.  C-RNN-GAN: Continuous recurrent neural networks with adversarial training , 2016, ArXiv.

[3]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[4]  Shunta Saito,et al.  Temporal Generative Adversarial Nets with Singular Value Clipping , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Song-Chun Zhu,et al.  Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[8]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[9]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[10]  Yi-Hsuan Yang,et al.  MidiNet: A Convolutional Generative Adversarial Network for Symbolic-Domain Music Generation , 2017, ISMIR.

[11]  Yi-Hsuan Yang,et al.  MuseGAN: Symbolic-domain Music Generation and Accompaniment with Multi-track Sequential Generative Adversarial Networks , 2017, ArXiv.

[12]  Yi-Hsuan Yang,et al.  MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment , 2017, AAAI.

[13]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[14]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[15]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[20]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.