End-to-end musical key estimation using a convolutional neural network

We present an end-to-end system for musical key estimation, based on a convolutional neural network. The proposed system not only out-performs existing key estimation methods proposed in the academic literature; it is also capable of learning a unified model for diverse musical genres that performs comparably to existing systems specialised for specific genres. Our experiments confirm that different genres do differ in their interpretation of tonality, and thus a system tuned e.g. for pop music performs subpar on pieces of electronic music. They also reveal that such cross-genre setups evoke specific types of error (predicting the relative or parallel minor). However, using the data-driven approach proposed in this paper, we can train models that deal with multiple musical styles adequately, and without major losses in accuracy.

[1]  Benjamin Schrauwen,et al.  Audio-based Music Classification with a Pretrained Convolutional Network , 2011, ISMIR.

[2]  Peter Knees,et al.  Two Data Sets for Tempo Estimation and Key Detection in Electronic Dance Music Annotated from User Corrections , 2015, ISMIR.

[3]  Jean-Pierre Martens,et al.  Combining Musicological Knowledge About Chords and Keys in a Simultaneous Chord and Local Key Estimation System , 2014 .

[4]  Juan Pablo Bello,et al.  Rethinking Automatic Chord Recognition with Convolutional Neural Networks , 2012, 2012 11th International Conference on Machine Learning and Applications.

[5]  Gerhard Widmer,et al.  On the Potential of Simple Framewise Approaches to Piano Transcription , 2016, ISMIR.

[6]  Gerhard Widmer,et al.  A fully convolutional deep auditory model for musical chord recognition , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[7]  Mark Sandler,et al.  Signal Processing Parameters for Tonality Estimation , 2007 .

[8]  S. Dixon,et al.  MIREX 2019: VAMP PLUGINS FROM THE CENTRE FOR DIGITAL MUSIC , 2013 .

[9]  Emilia Gómez,et al.  Key Estimation in Electronic Dance Music , 2016, ECIR.

[10]  Tijl De Bie,et al.  An End-to-End Machine Learning System for Harmonic Analysis of Music , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Ichiro Fujinaga,et al.  An Expert Ground Truth Set for Audio Chord Recognition and Music Analysis , 2011, ISMIR.

[12]  Steffen Pauws,et al.  Musical key extraction from audio , 2004, ISMIR.

[13]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[14]  Gerhard Widmer,et al.  Feature Learning for Chord Recognition: The Deep Chroma Extractor , 2016, ISMIR.

[15]  Li Lei,et al.  Key detection through pitch class distribution model and ANN , 2009, 2009 16th International Conference on Digital Signal Processing.

[16]  Thomas Grill,et al.  Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks , 2015, ISMIR.

[17]  Florian Krebs,et al.  madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.

[18]  Simon Dixon,et al.  Simultaneous Estimation of Chords and Musical Context From Audio , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Taemin Cho Improved techniques for automatic chord recognition from music audio signals , 2014 .

[20]  David Temperley,et al.  What's Key for Key? The Krumhansl-Schmuckler Key-Finding Algorithm Reconsidered , 1999 .

[21]  Augusto Sarti,et al.  Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony , 2013 .