Music chord recognition from audio data using bidirectional encoder-decoder LSTMs

In this paper, we discuss some methods for chord recognition based on long short-term memory recurrent neural networks (LSTM, LSTM-RNN). Chord progressions play an important role in the generation process of music. Actually, music processing systems containing a model for chord progressions achieve high accuracies in tasks like music structure analysis, multi pitch analysis an automatic composition or accompaniment. In previous research, chord progressions were obtained rule- based or have been modeled using stochastic methods like hidden Markov models or probabilistic context-free grammars. Pitch patterns were then regarded as the observations resulting from the hidden states of the chord progression model. Recently, con- volutional neural networks have been used for chord recognition with considerable success. On the other hand, LSTM networks have been shown to be suitable for generating chord progressions, since these neural networks can process time series data very well. The purpose of this study is to evaluate and compare three types of LSTM networks based on the bidirectional and encoderdecoder structure with regards to their chord recognition performance. In order to extract more effective data for chord recognition, we use a constant-Q transform and specmurt analysis to suppress overtone components, and chroma vectorization to reduce the feature dimensionality. The evaluation results show that the encoder-decoder-based LSTM can learn the relationship between the observed chroma vectors and the associated chord progression more effectively than simpler LSTM networks.

[1]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[2]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[3]  Paul Hudak,et al.  Grammar-based automated music composition in Haskell , 2013, FARM '13.

[4]  Hirokazu Kameoka,et al.  Infinite-state spectrum model for music signal analysis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Markus Schedl,et al.  Music Information Retrieval: Recent Developments and Applications , 2014, Found. Trends Inf. Retr..

[6]  Hirokazu Kameoka,et al.  Specmurt Analysis of Polyphonic Music Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Geoffroy Peeters Chroma-based estimation of musical key from audio-signal analysis , 2006, ISMIR.

[8]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Malcolm Slaney,et al.  Automatic Chord Recognition from Audio Using a HMM with Supervised Learning , 2006, ISMIR.

[10]  Hirokazu Kameoka,et al.  Input-Output HMM Applied to Automatic Arrangement for Guitars , 2013, J. Inf. Process..

[11]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[12]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[13]  Shigeki Sagayama,et al.  Hidden Markov Model Applied to Automatic Harmonization of Given Melodies , 2000 .

[14]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Simon Dixon,et al.  Approximate Note Transcription for the Improved Identification of Difficult Chords , 2010, ISMIR.