Automatic Audio Chord Recognition With MIDI-Trained Deep Feature and BLSTM-CRF Sequence Decoding Model

With the advances of machine learning technologies, data-driven feature extraction and sequence modeling approaches are being widely explored for automatic chord recognition tasks. Currently, there is a bottleneck in the amount of enough annotated data for training robust acoustic models, as hand-annotating time-synchronized chord labels requires professional musical skills and considerable labor. To cope with this limitation, in this paper, we propose a convolutional neural network (CNN) based deep feature extractor, which is trained on a large set of time, synchronized musical instrument digital interface audio data pairs and can robustly estimate pitch class activations of real-world music audio recordings. The CNN feature extractor plus a bidirectional long short-term memory conditional random field decoding model forms the proposed hybrid system for automatic chord recognition. Experiments show that the proposed model is compatible for both regular major/minor triad chord classification and larger vocabulary chord recognition, and outperforms other state-of-the-art chord recognition systems.

[1]  Juan Pablo Bello,et al.  Four Timely Insights on Automatic Chord Estimation , 2015, ISMIR.

[2]  Yu-Kwong Kwok,et al.  A Hybrid Gaussian-HMM-Deep Learning Approach for Automatic Chord Estimation with Very Large Vocabulary , 2016, ISMIR.

[3]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[4]  Florian Krebs,et al.  Downbeat Tracking Using Beat Synchronous Features with Recurrent Neural Networks , 2016, ISMIR.

[5]  Yi-Hsuan Yang,et al.  Automatic chord recognition for music classification and retrieval , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[6]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Juan Pablo Bello,et al.  Learning a robust Tonnetz-space transform for automatic chord recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Christian Schörkhuber CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[10]  Juan Pablo Bello,et al.  Rethinking Automatic Chord Recognition with Convolutional Neural Networks , 2012, 2012 11th International Conference on Machine Learning and Applications.

[11]  Maurizio Omologo,et al.  Time-frequency reassigned features for automatic chord recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[13]  Gerhard Widmer,et al.  A fully convolutional deep auditory model for musical chord recognition , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[14]  Johan Pauwels,et al.  Combining Harmony-Based and Novelty-Based Approaches for Structural Segmentation , 2013, ISMIR.

[15]  Meinard Müller,et al.  Towards Timbre-Invariant Audio Features for Harmony-Based Music , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Gerhard Widmer,et al.  Feature Learning for Chord Recognition: The Deep Chroma Extractor , 2016, ISMIR.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Malcolm Slaney,et al.  Automatic Chord Recognition from Audio Using a HMM with Supervised Learning , 2006, ISMIR.

[20]  Wei Li,et al.  Music Chord Recognition Based on Midi-Trained Deep Feature and BLSTM-CRF Hybird Decoding , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Juan Pablo Bello,et al.  Audio-Based Cover Song Retrieval Using Approximate Chord Sequences: Testing Shifts, Gaps, Swaps and Beats , 2007, ISMIR.

[22]  Yu-Kwong Kwok,et al.  Large Vocabulary Automatic Chord Estimation with an Even Chance Training Scheme , 2017, ISMIR.

[23]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[24]  Yoshua Bengio,et al.  Audio Chord Recognition with Recurrent Neural Networks , 2013, ISMIR.

[25]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[26]  Daniel P. W. Ellis,et al.  MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.

[27]  Ajay Srinivasamurthy,et al.  Chord Recognition Using Duration-explicit Hidden Markov Models , 2012, ISMIR.

[28]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[29]  Markus Schedl,et al.  Polyphonic piano note transcription with recurrent neural networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Justin Salamon,et al.  Deep Salience Representations for F0 Estimation in Polyphonic Music , 2017, ISMIR.

[31]  Rafael Ramírez,et al.  Genre Classification Using Harmony Rules Induced from Automatic Chord Transcriptions , 2009, ISMIR.

[32]  Juan Pablo Bello,et al.  Structured Training for Large-Vocabulary Chord Recognition , 2017, ISMIR.

[33]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[34]  Kyogu Lee,et al.  Identifying Cover Songs from Audio Using Harmonic Representation , 2006 .

[35]  Colin Raffel,et al.  Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching , 2016 .

[36]  Simon Dixon,et al.  Audio Chord Recognition with a Hybrid Recurrent Neural Network , 2015, ISMIR.

[37]  Simon Dixon,et al.  Approximate Note Transcription for the Improved Identification of Difficult Chords , 2010, ISMIR.

[38]  Robert A. Moog MIDI: Musical Instrument Digital Interface , 1986 .

[39]  Antti Laaksonen Automatic Melody Transcription based on Chord Transcription , 2014, ISMIR.

[40]  Gerhard Widmer,et al.  On the Futility of Learning Complex Frame-Level Language Models for Chord Recognition , 2017, Semantic Audio.

[41]  Björn W. Schuller,et al.  Universal Onset Detection with Bidirectional Long Short-Term Memory Neural Networks , 2010, ISMIR.

[42]  Christopher Harte,et al.  Towards automatic extraction of harmony information from music signals , 2010 .

[43]  Simon Dixon,et al.  Simultaneous Estimation of Chords and Musical Context From Audio , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  Taemin Cho Improved techniques for automatic chord recognition from music audio signals , 2014 .