论文信息 - Generating Music by Fine-Tuning Recurrent Neural Networks with Reinforcement Learning - 字舞流文

Generating Music by Fine-Tuning Recurrent Neural Networks with Reinforcement Learning

Supervised learning with next-step prediction is a common way to train a sequence prediction model; however, it suffers from known failure modes and is notoriously difficult to train models to learn certain properties, such as having a coherent global structure. Reinforcement learning can be used to impose arbitrary properties on generated data by choosing appropriate reward functions. In this paper we propose a novel approach for sequence training, where we refine a sequence predictor by optimizing for some imposed reward functions, while maintaining good predictive properties learned from data. We propose efficient ways to solve this by augmenting deep Q-learning with a cross-entropy reward and deriving novel off-policy methods for RNNs from stochastic optimal control (SOC). We explore the usefulness of our approach in the context of music generation. An LSTM is trained on a large corpus of songs to predict the next note in a musical sequence. This Note-RNN is then refined using RL, where the reward function is a combination of rewards based on rules of music theory, as well as the output of another trained Note-RNN. We show that this combination of ML and RL can not only produce more pleasing melodies, but that it can significantly reduce unwanted behaviors and failure modes of the RNN.

Douglas Eck | Richard E. Turner | Natasha Jaques | Shixiang Gu | S. Gu | Natasha Jaques | D. Eck

[1] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[2] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[3] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[4] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[5] David Barber,et al. A generative model for music transcription , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6] Joelle Pineau,et al. An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[7] Ferenc Huszar,et al. How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.

[8] C. Palmer,et al. Emotional response to musical repetition. , 2012, Emotion.

[9] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[10] Dale Schuurmans,et al. Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.

[11] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[12] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.

[13] Jürgen Schmidhuber,et al. Learning to forget: continual prediction with LSTM , 1999 .

[14] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[15] Richard E. Turner,et al. Neural Adaptive Sequential Monte Carlo , 2015, NIPS.

[16] Yoshua Bengio,et al. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[17] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[18] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[19] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[20] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .

[21] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[22] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.

[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24] Matthias Bethge,et al. A note on the evaluation of generative models , 2015, ICLR.

[25] Jürgen Schmidhuber,et al. Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[26] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[27] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[28] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[29] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[30] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.

[31] Bob L. Sturm,et al. Music transcription modelling and composition using deep learning , 2016, ArXiv.

[32] Tetsunori Kobayashi,et al. Multiscale recurrent neural network based language model , 2015, INTERSPEECH.

[33] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[34] R. Gauldin. A practical approach to eighteenth-century counterpoint , 1985 .

[35] N. Roy,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2013 .