Robust Downbeat Tracking Using an Ensemble of Convolutional Networks

In this paper, we present a novel state-of-the-art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm, and bass content to feed convolutional neural networks that are adapted to take advantage of the characteristics of each feature. This ensemble of neural networks is combined to obtain one downbeat likelihood per tatum. The downbeat sequence is finally decoded with a flexible and efficient temporal model which takes advantage of the assumed metrical continuity of a song. We then perform an evaluation of our system on a large base of nine datasets, compare its performance to four other published algorithms and obtain a significant increase of 16.8% points compared to the second-best system, for altogether a moderate cost in test and training. The influence of each step of the method is studied to show its strengths and shortcomings.

[1]  Gaël Richard,et al.  Downbeat tracking with multiple features and deep neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Malcolm D. Macleod,et al.  Particle Filtering Applied to Musical Tempo Tracking , 2004, EURASIP J. Adv. Signal Process..

[3]  Masataka Goto,et al.  An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds , 2001 .

[4]  E Tsunoo,et al.  Beyond Timbral Statistics: Improving Music Classification Using Percussive Patterns and Bass Lines , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Namunu Chinthaka Maddage Automatic structure detection for popular music , 2006, IEEE Multimedia.

[6]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Geoffroy Peeters,et al.  Joint Estimation of Chords and Downbeats From an Audio Signal , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Daniel Gärtner Unsupervised Learning of the Downbeat in Drum Patterns , 2014, Semantic Audio.

[9]  T. Jehan Downbeat prediction by listening and learning , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[10]  Ali Taylan Cemgil,et al.  Inferring Metrical Structure in Music Using Particle Filters , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Florian Krebs,et al.  Evaluating the Online Capabilities of Onset Detection Methods , 2012, ISMIR.

[12]  Juan Pablo Bello,et al.  A Robust Mid-Level Representation for Harmonic Content in Music Signals , 2005, ISMIR.

[13]  Anssi Klapuri,et al.  Measuring the similarity of Rhythmic Patterns , 2002, ISMIR.

[14]  Matthew E. P. Davies,et al.  AUTOMATED RHYTHMIC TRANSFORMATION OF MUSICAL AUDIO , 2008 .

[15]  Yoichi Muraoka,et al.  Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions , 1999, Speech Commun..

[16]  Orberto,et al.  Evaluation Methods for Musical Audio Beat Tracking Algorithms , 2009 .

[17]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[18]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[19]  M. Jones,et al.  The role of accent salience and joint accent structure in meter perception. , 2009, Journal of experimental psychology. Human perception and performance.

[20]  Eric Battenberg,et al.  Techniques for Machine Understanding of Live Drum Performances , 2012 .

[21]  Geoffroy Peeters,et al.  Simultaneous Beat and Downbeat-Tracking Using a Probabilistic Framework: Theory and Large-Scale Evaluation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[23]  Dan Barry,et al.  Automatic Bar Line Segmentation , 2007 .

[24]  Hamish Allan Bar lines and beyond - Metre tracking in digital audio , 2004 .

[25]  Geoffrey E. Hinton,et al.  On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Tuomas Eerola,et al.  The role of melodic and temporal cues in perceiving musical meter. , 2004, Journal of experimental psychology. Human perception and performance.

[27]  Gerhard Widmer,et al.  Evaluating Rhythmic descriptors for Musical Genre Classification , 2004 .

[28]  Ajay Srinivasamurthy,et al.  In Search of Automatic Rhythm Analysis Methods for Turkish and Indian Art Music , 2014 .

[29]  Simon Dixon,et al.  Simultaneous Estimation of Chords and Musical Context From Audio , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[31]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[32]  Judith C. Brown,et al.  An efficient algorithm for the calculation of a constant Q transform , 1992 .

[33]  David Temperley,et al.  Statistical Analysis of Harmony and Melody in Rock Music , 2013 .

[34]  Justin London,et al.  Hearing in Time: Psychological Aspects of Musical Meter , 2004 .

[35]  Satoshi Tojo,et al.  Musical Structural Analysis Database Based on GTTM , 2014, ISMIR.

[36]  Augusto Sarti,et al.  Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony , 2013 .

[37]  Miguel A. Alonso Arevalo,et al.  Extraction d'information rythmique à partir d'enregistrements musicaux , 2006 .

[38]  Luiz W. P. Biscainho,et al.  Beat and Downbeat Tracking Based on Rhythmic Patterns Applied to the Uruguayan Candombe Drumming , 2015, ISMIR.

[39]  Matthew E. P. Davies,et al.  One in the Jungle: Downbeat Detection in Hardcore, Jungle, and Drum and Bass , 2012, ISMIR.

[40]  E. Large,et al.  The dynamics of attending: How people track time-varying events. , 1999 .

[41]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[42]  Célestin Deliège À propos de l’ouvrage de Lerdahl et Jackendoff : « A Generative Theory of Tonal Music » , 1983 .

[43]  Florian Krebs,et al.  Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio , 2013, ISMIR.

[44]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[45]  Simon J. Godsill,et al.  Bayesian Modelling of Temporal Structure in Musical Audio , 2006, ISMIR.

[46]  Thomas Fillon,et al.  A probabilistic approach to simultaneous extraction of beats and downbeats , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47]  Jaakko Astola,et al.  Analysis of the meter of acoustic musical signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[48]  M. R. Jones The Role of Melodic and Rhythmic Accents in Musical Structure , 2003 .

[49]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[50]  Xavier Rodet,et al.  The Importance of Cross Database Evaluation in Sound Classification , 2003 .

[51]  Gaël Richard,et al.  Enhancing downbeat detection when facing different music styles , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[53]  Ajay Srinivasamurthy,et al.  Tracking the "Odd": Meter Inference in a Culturally Diverse Music Corpus , 2014, ISMIR.

[54]  Ye Wang,et al.  Key, Chord, and Rhythm Tracking of Popular Music Recordings , 2005, Computer Music Journal.

[55]  Matthew E. P. Davies,et al.  Reliability-Informed Beat Tracking of Musical Signals , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[56]  J. Thomassen Melodic accent: Experiments and a tentative model , 1982 .

[57]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[58]  Florian Krebs,et al.  An Efficient State-Space Model for Joint Tempo and Meter Tracking , 2015, ISMIR.

[59]  R. Jackendoff,et al.  A Generative Theory of Tonal Music , 1985 .