Imposing higher-level Structure in Polyphonic Music Generation using Convolutional Restricted Boltzmann Machines and Constraints

We introduce a method for imposing higher-level structure on generated, polyphonic music. A Convolutional Restricted Boltzmann Machine (C-RBM) as a generative model is combined with gradient descent constraint optimisation to provide further control over the generation process. Among other things, this allows for the use of a "template" piece, from which some structural properties can be extracted, and transferred as constraints to the newly generated material. The sampling process is guided with Simulated Annealing to avoid local optima, and to find solutions that both satisfy the constraints, and are relatively stable with respect to the C-RBM. Results show that with this approach it is possible to control the higher-level self-similarity structure, the meter, and the tonal properties of the resulting musical piece, while preserving its local musical coherence.

[1]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  Peter M. Todd,et al.  A Connectionist Approach To Algorithmic Composition , 1989 .

[3]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[4]  C. Krumhansl,et al.  Mental representations for musical meter. , 1990, Journal of experimental psychology. Human perception and performance.

[5]  C. Krumhansl Cognitive Foundations of Musical Pitch , 1990 .

[6]  Michael C. Mozer,et al.  Neural Network Music Composition by Prediction: Exploring the Benefits of Psychoacoustic Constraints and Multi-scale Processing , 1994, Connect. Sci..

[7]  David Cope,et al.  Experiments In Musical Intelligence , 1996 .

[8]  Alexander Filatov,et al.  Handwritten ZIP code recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[9]  Geraint A. Wiggins,et al.  Towards A Framework for the Evaluation of Machine Compositions , 2001 .

[10]  D. Temperley The Cognition of Basic Musical Structures , 2001 .

[11]  J. Schmidhuber,et al.  A First Look at Music Composition using LSTM Recurrent Neural Networks , 2002 .

[12]  Gerhard Widmer,et al.  Discovering simple rules in complex data: A meta-learning algorithm and some surprising musical discoveries , 2003, Artif. Intell..

[13]  M. Schmuckler,et al.  The perception of tonal structure through the differentiation and organization of pitches. , 2004, Journal of experimental psychology. Human perception and performance.

[14]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[15]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[16]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[17]  Roger Levy,et al.  Speakers optimize information density through syntactic reduction , 2006, NIPS.

[18]  M. Aylett,et al.  Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. , 2006, The Journal of the Acoustical Society of America.

[19]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[20]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[21]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[22]  François Pachet,et al.  Markov constraints: steerable generation of Markov sequences , 2010, Constraints.

[23]  Geraint A. Wiggins,et al.  The Role of Expectation and Probabilistic Learning in Auditory Boundary Perception: A Model Comparison , 2010, Perception.

[24]  Nicola Barbieri,et al.  Regularized Gibbs Sampling for User Profiling with Soft Constraints , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[25]  Anna Jordanous,et al.  A Standardised Procedure for Evaluating Creative Systems: Computational Creativity Evaluation Based on What it is to be Creative , 2012, Cognitive Computation.

[26]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[27]  Geraint A. Wiggins,et al.  Auditory Expectation: The Information Dynamics of Music Perception and Cognition , 2012, Top. Cogn. Sci..

[28]  Arne Eigenfeldt,et al.  Evolving structures for electronic dance music , 2013, GECCO '13.

[29]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[30]  Razvan Pascanu,et al.  Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Roland Memisevic,et al.  Learning to Relate Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  D. Temperley Information Flow and Repetition in Music , 2014 .

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[35]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[36]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[37]  Christian Osendorfer,et al.  On Fast Dropout and its Applicability to Recurrent Networks , 2013, ICLR.

[38]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[39]  Carlos Eduardo Cancino Chacón,et al.  Probabilistic Segmentation of Musical Sequences Using Restricted Boltzmann Machines , 2015, MCM.

[40]  Shlomo Dubnov,et al.  Pattern discovery from audio recordings by Variable Markov Oracle: A music information dynamics approach , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Jun Zhu,et al.  Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation , 2015, IJCAI.

[42]  Robin C. Laney,et al.  Developing and evaluating computational models of musical style , 2015, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[43]  François Pachet,et al.  Exact Sampling for Regular and Markov Constraints with Belief Propagation , 2015, CP.

[44]  Mark B. Sandler,et al.  Text-based LSTM networks for Automatic Music Composition , 2016, ArXiv.

[45]  Matthias Abend Cognitive Foundations Of Musical Pitch , 2016 .

[46]  Elaine Chew,et al.  Title of paper : MorpheuS : Automatic music generation with recurrent pattern constraints and tension profiles , 2016 .

[47]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[48]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Bob L. Sturm,et al.  Music transcription modelling and composition using deep learning , 2016, ArXiv.

[50]  Stephane Rivaud Sampling Markov Models under Binary Equality Constraints Is Hard , 2016 .

[51]  D. Conklin Chord sequence generation with semiotic patterns , 2016, Machine Learning and Music Generation.

[52]  Douglas Eck,et al.  Hierarchical Variational Autoencoders for Music , 2017 .

[53]  Douglas Eck,et al.  Counterpoint by Convolution , 2019, ISMIR.

[54]  Jamie Shotton,et al.  Automatic Stylistic Composition of Bach Chorales with Deep LSTM , 2017, ISMIR.

[55]  Frank Nielsen,et al.  DeepBach: a Steerable Model for Bach Chorales Generation , 2016, ICML.

[56]  G. Widmer,et al.  Learning Transformations of Musical Material using Gated Autoencoders , 2017 .