Coupled Recurrent Models for Polyphonic Music Composition

This paper introduces a novel recurrent model for music composition that is tailored to the structure of polyphonic music. We propose an efficient new conditional probabilistic factorization of musical scores, viewing a score as a collection of concurrent, coupled sequences: i.e. voices. To model the conditional distributions, we borrow ideas from both convolutional and recurrent neural models; we argue that these ideas are natural for capturing music's pitch invariances, temporal structure, and polyphony. We train models for single-voice and multi-voice composition on 2,300 scores from the KernScores dataset.

[1]  Bob L. Sturm,et al.  Music transcription modelling and composition using deep learning , 2016, ArXiv.

[2]  Gaëtan Hadjeres,et al.  Deep Learning Techniques for Music Generation - A Survey , 2017, ArXiv.

[3]  Jamie Shotton,et al.  Automatic Stylistic Composition of Bach Chorales with Deep LSTM , 2017, ISMIR.

[4]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[5]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Craig Stuart Sapp Online Database of Scores in the Humdrum File Format , 2005, ISMIR.

[7]  Douglas Eck,et al.  Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.

[8]  Anna Jordanous,et al.  A Standardised Procedure for Evaluating Creative Systems: Computational Creativity Evaluation Based on What it is to be Creative , 2012, Cognitive Computation.

[9]  Douglas Eck,et al.  Counterpoint by Convolution , 2019, ISMIR.

[10]  Kemal Ebcioglu,et al.  An Expert System for Harmonizing Four-Part Chorales , 1988, ICMC.

[11]  Peter M. Todd,et al.  A Connectionist Approach To Algorithmic Composition , 1989 .

[12]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[13]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[14]  C. Roads,et al.  Artificial intelligence and music , 1980, Copyright in the Music Industry.

[15]  Christian Walder,et al.  Modelling Symbolic Music: Beyond the Piano Roll , 2016, ACML.

[16]  Frank Nielsen,et al.  DeepBach: a Steerable Model for Bach Chorales Generation , 2016, ICML.

[17]  Frederick P. Brooks,et al.  An experiment in musical composition , 1957, IRE Trans. Electron. Comput..

[18]  Jürgen Schmidhuber,et al.  Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[19]  Jeremy Pickens,et al.  Polyphonic music modeling with random fields , 2003, MULTIMEDIA '03.

[20]  Zaïd Harchaoui,et al.  Learning Features of Music from Scratch , 2016, ICLR.

[21]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[22]  Colin Raffel,et al.  A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music , 2018, ICML.

[23]  Douglas Eck,et al.  This time with feeling: learning expressive musical performance , 2018, Neural Computing and Applications.

[24]  Douglas Eck,et al.  Tuning Recurrent Neural Networks with Reinforcement Learning , 2016, ICLR.

[25]  Douglas Eck,et al.  Music Transformer , 2018, 1809.04281.

[26]  Geraint A. Wiggins,et al.  Evaluating Cognitive Models of Musical Composition , 2007 .

[27]  T. Kohonen A self-learning musical grammar, or 'associative memory of the second kind' , 1989, International 1989 Joint Conference on Neural Networks.

[28]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[29]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[31]  R. Shepard Geometrical approximations to the structure of musical pitch. , 1982, Psychological review.

[32]  Richard C. Pinkerton Information theory and melody. , 1956 .

[33]  Yiming Yang,et al.  Transformer-XL: Language Modeling with Longer-Term Dependency , 2018 .

[34]  Kratarth Goel,et al.  Modeling temporal dependencies in data using a DBN-LSTM , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[35]  Judy A. Franklin Multi-Phase Learning for Jazz Improvisation and Interaction , 2001 .

[36]  Ching-Hua Chuan,et al.  A Functional Taxonomy of Music Generation Systems , 2017, ACM Comput. Surv..

[37]  Geraint A. Wiggins,et al.  Towards A Framework for the Evaluation of Machine Compositions , 2001 .

[38]  Daniel D. Johnson,et al.  Generating Polyphonic Music Using Tied Parallel Networks , 2017, EvoMUSART.

[39]  Darrell Conklin,et al.  Music Generation from Statistical Models , 2003 .

[40]  Michael C. Mozer,et al.  Neural Network Music Composition by Prediction: Exploring the Benefits of Psychoacoustic Constraints and Multi-scale Processing , 1994, Connect. Sci..