Predicting Expressive Dynamics in Piano Performances using Neural Networks

This paper presents a model for predicting expressive accentuation in piano performances with neural networks. Using Restricted Boltzmann Machines (RBMs), features are learned from performance data, after which these features are used to predict performed loudness. During feature learning, data describing more than 6000 musical pieces is used; when training for prediction, two datasets are used, both recorded on a B¨ osendorfer piano (accurately measuring note on- and offset times and velocity values), but describing different compositions performed by different pianists. The resulting model is tested by predicting note velocity for unseen performances. Our approach differs from earlier work in a number of ways: (1) an additional input representation based on a local history of velocity values is used, (2) the RBMs are trained to result in a network with sparse activations, (3) network connectivity is increased by adding skip-connections, and (4) more data is used for training. These modifications result in a network performing better than the state-of-the-art on the same data and more descriptive features, which can be used for rendering performances, or for gaining insight into which aspects of a musical piece influence its performance.

[1]  C. Palmer Music performance. , 1997, Annual review of psychology.

[2]  Anders Friberg,et al.  Performance Rules for Computer-Controlled Contemporary Keyboard Music , 1991 .

[3]  Florian Krebs,et al.  An Assessment of Learned Score Features for Modeling Expressive Dynamics in Music , 2014, IEEE Transactions on Multimedia.

[4]  Amos J. Storkey,et al.  Comparing Probabilistic Models for Melodic Sequences , 2011, ECML/PKDD.

[5]  Heinrich Schenker,et al.  Five Graphic Music Analyses , 1969 .

[6]  Seiji Inokuchi,et al.  Learning performance rules in a music interpretation system , 1993, Comput. Humanit..

[7]  Gerhard Widmer,et al.  The Magaloff Project: An Interim Report , 2010 .

[8]  Matthieu Cord,et al.  Biasing Restricted Boltzmann Machines to Manipulate Latent Selectivity and Sparsity , 2010, NIPS 2010.

[9]  Gerhard Widmer,et al.  Large-scale Induction of Expressive Performance Rules: First Quantitative Results , 2000, ICMC.

[10]  S. van Herwaarden Teaching neural networks to play the piano , 2014 .

[11]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[12]  R. Jackendoff,et al.  A Generative Theory of Tonal Music , 1985 .

[13]  M. Grachten EXPLAINING MUSICAL EXPRESSION AS A MIXTURE OF BASIS FUNCTIONS , 2011 .

[14]  An Accent-Based Approach to Automatic Rendering of Piano Performance: Preliminary Auditory Evaluation , 2011 .

[15]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[16]  P. Juslin,et al.  Emotional Expression in Music Performance: Between the Performer's Intention and the Listener's Experience , 1996 .

[17]  R. Parncutt Accents and expression in piano performance , 2003 .

[18]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.