论文信息 - Audio Chord Recognition with Recurrent Neural Networks

Audio Chord Recognition with Recurrent Neural Networks

In this paper, we present an audio chord recognition system based on a recurrent neural network. The audio features are obtained from a deep neural network optimized with a combination of chromagram targets and chord information, and aggregated over different time scales. Contrarily to other existing approaches, our system incorporates acoustic and musicological models under a single training objective. We devise an efficient algorithm to search for the global mode of the output distribution while taking long-term dependencies into account. The resulting method is competitive with state-of-the-art approaches on the MIREX dataset in the major/minor prediction task.

[1] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[2] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .

[3] Peter F. Brown,et al. The acoustic-modeling problem in automatic speech recognition , 1987 .

[4] Michael C. Mozer,et al. Neural Network Music Composition by Prediction: Exploring the Benefits of Psychoacoustic Constraints and Multi-scale Processing , 1994, Connect. Sci..

[5] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[6] Jürgen Schmidhuber,et al. Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[7] Douglas Eck,et al. Aggregate features and ADABOOST for music classification , 2006, Machine Learning.

[8] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[9] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[10] Matthias Mauch,et al. Automatic chord transcription from audio using computational models of musical context , 2010 .

[11] Douglas Eck,et al. Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.

[12] Christopher Harte,et al. Towards automatic extraction of harmony information from music signals , 2010 .

[13] Simon Dixon,et al. Approximate Note Transcription for the Improved Identification of Difficult Chords , 2010, ISMIR.

[14] Maurizio Omologo,et al. Time-frequency reassigned features for automatic chord recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Juhan Nam,et al. A Classification-Based Polyphonic Piano Transcription Approach Using Learned Feature Representations , 2011, ISMIR.

[16] Yoshua Bengio,et al. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[17] Douglas Eck,et al. Building Musically-relevant Audio Features through Multiple Timescale Representations , 2012, ISMIR.

[18] Ajay Srinivasamurthy,et al. Chord Recognition Using Duration-explicit Hidden Markov Models , 2012, ISMIR.

[19] Markus Schedl,et al. Polyphonic piano note transcription with recurrent neural networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[21] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[22] Juan Pablo Bello,et al. Rethinking Automatic Chord Recognition with Convolutional Neural Networks , 2012, 2012 11th International Conference on Machine Learning and Applications.

[23] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[24] Tijl De Bie,et al. An End-to-End Machine Learning System for Harmonic Analysis of Music , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[25] Yoshua Bengio,et al. High-dimensional sequence transduction , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26] Razvan Pascanu,et al. Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.