Melody extraction in symphonic classical music: a comparative study of mutual agreement between humans and algorithms

This work deals with the task of melody extraction from symphonic music recordings, where the term ‘melody’ is understood as ‘the single (monophonic) pitch sequence that a listener might reproduce if asked to whistle or hum a piece of polyphonic music and that a listener would recognise as being the “essence” of that music when heard in comparison’. Melody extraction algorithms are commonly evaluated by comparing the pitch sequences they estimate against a “ground truth” created by humans. In order to collect evaluation material from our target repertoire, classical music excerpts in large ensemble settings, we collected recordings of people singing along with the music. In this work, we analyse such recordings and the output of state-of-the-art automatic melody extraction methods, in order to study the agreement between humans and algorithms. Agreement is assessed by means of standard measures that compare pitch sequences on a frame basis, mainly chroma accuracy, which ignores octave information. We also study the correlation between this agreement and the properties of the considered musical excerpts (e.g. melodic density, tessitura, complexity) and of the subjects (e.g. musical background, degree of knowledge of each piece). We confirm the challenges of melody extraction for this particular repertoire, and we identify note density and pitch complexity as the melodic features most correlated to the accuracy and mutual agreement for both humans and algorithms.

[1]  Paul T. von Hippel,et al.  Redefining Pitch Proximity: Tessitura and Mobility as Constraints on Melodic Intervals , 2000 .

[2]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[3]  Karin Dressler,et al.  Towards Computational Auditory Scene Analysis: Melody Extraction from Polyphonic Music , 2012 .

[4]  Jordi Janer,et al.  Score-informed and timbre independent lead instrument separation in real-world scenarios , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[5]  Matthew E. P. Davies,et al.  Multi-Feature Beat Tracking , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Antoine Liutkus,et al.  Probabilistic model for main melody extraction using Constant-Q transform , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Anssi Klapuri,et al.  Melody Description and Extraction in the Context of Music Content Processing , 2003 .

[8]  Eleanor Selfridge-Field,et al.  Conceptual and representational issues in melodic comparison , 1998 .

[9]  Marc Leman,et al.  Music and Schema Theory : Cognitive Foundations of Systematic Musicology , 1995 .

[10]  Gaël Richard,et al.  Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Simon Dixon,et al.  PYIN: A fundamental frequency estimator using probabilistic threshold distributions , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Emilia Gómez,et al.  Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Emilia Gómez,et al.  Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music , 2016 .

[14]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Karin Dressler MULTIPLE FUNDAMENTAL FREQUENCY EXTRACTION FOR MIREX 2012 , 2011 .

[16]  Dean Keith Simonton,et al.  Melodic Structure and Note Transition Probabilities: A Content Analysis of 15,618 Classical Themes , 1984 .

[17]  Daniel P. W. Ellis,et al.  Melody Extraction from Polyphonic Music Signals: Approaches, applications, and challenges , 2014, IEEE Signal Processing Magazine.

[18]  T. Eerola,et al.  Expectancy-Based Model of Melodic Complexity , 2000 .