Evaluating Language Models of Tonal Harmony

This study borrows and extends probabilistic language models from natural language processing to discover the syntactic properties of tonal harmony. Language models come in many shapes and sizes, but their central purpose is always the same: to predict the next event in a sequence of letters, words, notes, or chords. However, few studies employing such models have evaluated the most state-of-the-art architectures using a large-scale corpus of Western tonal music, instead preferring to use relatively small datasets containing chord annotations from contemporary genres like jazz, pop, and rock. Using symbolic representations of prominent instrumental genres from the common-practice period, this study applies a flexible, data-driven encoding scheme to (1) evaluate Finite Context (or n-gram) models and Recurrent Neural Networks (RNNs) in a chord prediction task; (2) compare predictive accuracy from the best-performing models for chord onsets from each of the selected datasets; and (3) explain differences between the two model architectures in a regression analysis. We find that Finite Context models using the Prediction by Partial Match (PPM) algorithm outperform RNNs, particularly for the piano datasets, with the regression model suggesting that RNNs struggle with particularly rare chord types.

[1]  Gerhard Widmer,et al.  The Magaloff Project: An Interim Report , 2010 .

[2]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[3]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[4]  Geraint A. Wiggins,et al.  Improved Methods for Statistical Modelling of Monophonic Music , 2004 .

[5]  Daniel Müllensiefen,et al.  Court decisions on music plagiarism and the predictive value of similarity algorithms , 2009 .

[6]  Ian Quinn,et al.  Are Pitch-Class Profiles Really “Key for Key”? , 2010 .

[7]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[8]  Augusto Sarti,et al.  A Data-Driven Model of Tonal Chord Sequence Complexity , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Stephen McAdams,et al.  Simulating melodic and harmonic expectations for tonal cadences using probabilistic models , 2018 .

[10]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[11]  Marcus T. Pearce,et al.  The construction and evaluation of statistical models of melodic structure in music perception and composition , 2005 .

[12]  Ichiro Fujinaga,et al.  An Expert Ground Truth Set for Audio Chord Recognition and Music Analysis , 2011, ISMIR.

[13]  Geraint A. Wiggins,et al.  The Prediction of Merged Attributes with Multiple Viewpoint Systems , 2016 .

[14]  D. Conklin Multiple Viewpoint Systems for Music Classification , 2013 .

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Gerhard Widmer,et al.  Using AI and machine learning to study expressive music performance: project survey and first report , 2001, AI Commun..

[17]  Daniel Dominic Sleator,et al.  Modeling Meter and Harmony: A Preference-Rule Approach , 1999, Computer Music Journal.

[18]  Daniel Shanahan,et al.  The Use of Large Corpora to Train a New Type of Key-Finding Algorithm: An Improved Treatment of the Minor Mode , 2013 .

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Darrell Conklin,et al.  Representation and Discovery of Vertical Patterns in Music , 2002, ICMAI.

[21]  Geraint A. Wiggins,et al.  Methods for Combining Statistical Models of Music , 2004, CMMR.

[22]  Gerhard Widmer,et al.  A Large-Scale Study of Language Models for Chord Prediction , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[24]  B. Efron,et al.  Bootstrap confidence intervals , 1996 .

[25]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[26]  Ian H. Witten,et al.  Multiple viewpoint systems for music prediction , 1995 .