Discriminative learning and inference in the Recurrent Temporal RBM for melody modelling

We are interested in modelling musical pitch sequences in melodies in the symbolic form. The task here is to learn a model to predict the probability distribution over the various possible values of pitch of the next note in a melody, given those leading up to it. For this task, we propose the Recurrent Temporal Discriminative Restricted Boltzmann Machine (RTDRBM). It is obtained by carrying out discriminative learning and inference as put forward in the Discriminative RBM (DRBM), in a temporal setting by incorporating the recurrent structure of the Recurrent Temporal RBM (RTRBM). The model is evaluated on the cross entropy of its predictions using a corpus containing 8 datasets of folk and chorale melodies, and compared with n-grams and other standard connectionist models. Results show that the RTDRBM has a better predictive performance than the rest of the models, and that the improvement is statistically significant.

[1]  Jürgen Schmidhuber,et al.  Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[2]  Joel E. Cohen,et al.  Information theory and music , 2007 .

[3]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[4]  Marcus T. Pearce,et al.  The construction and evaluation of statistical models of melodic structure in music perception and composition , 2005 .

[5]  Kratarth Goel,et al.  Polyphonic Music Generation by Modeling Temporal Dependencies Using a RNN-DBN , 2014, ICANN.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  E. Narmour The Analysis and Cognition of Melodic Complexity: The Implication-Realization Model , 1992 .

[8]  Elizabeth K. Johnson,et al.  Statistical learning of tone sequences by human infants and adults , 1999, Cognition.

[9]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[10]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[11]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[12]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[13]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[14]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[15]  W. Dowling Emotion and Meaning in Music , 2008 .

[16]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[17]  Petri Toiviainen Modeling the Target-Note Technique of Bebop-Style Jazz Improvisation: An Artificial Neural Network Approach , 1995 .

[18]  Ian H. Witten,et al.  Multiple viewpoint systems for music prediction , 1995 .

[19]  Raymond Whorley The construction and evaluation of statistical models of melody and harmony , 2013 .

[20]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[21]  Tillman Weyde,et al.  A Distributed Model For Multiple-Viewpoint Melodic Prediction , 2013, ISMIR.

[22]  Justyna Humięcka-Jakubowska,et al.  Sweet Anticipation : Music and , 2006 .

[23]  Gregory E. Cox On the Relationship Between Entropy and Meaning in Music: An Exploration with Recurrent Neural Networks , 2010 .

[24]  Amos J. Storkey,et al.  Comparing Probabilistic Models for Melodic Sequences , 2011, ECML/PKDD.

[25]  Tillman Weyde,et al.  An RNN-based Music Language Model for Improving Automatic Music Transcription , 2014, ISMIR.

[26]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[27]  M. Pearce,et al.  Sweet Anticipation : Music and the Psychology of Expectation , 2007 .

[28]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[29]  S. Koelsch,et al.  Predictive information processing in music cognition. A critical review. , 2012, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[30]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[31]  Matthew I. Bellgard,et al.  Harmonizing Music the Boltzmann Way , 1994, Connect. Sci..

[32]  Peter Swire,et al.  Learning to Create Jazz Melodies Using Deep Belief Nets , 2010, ICCC.

[33]  M. Pearce,et al.  Electrophysiological correlates of melodic processing in congenital amusia , 2013, Neuropsychologia.

[34]  Leonard B. Meyer Meaning in music and information theory. , 1957 .

[35]  Christopher Ariza,et al.  Music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data , 2010, ISMIR.

[36]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[37]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[38]  Geraint A. Wiggins,et al.  Improved Methods for Statistical Modelling of Monophonic Music , 2004 .

[39]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.