Deep scattering transform applied to note onset detection and instrument recognition

Automatic Music Transcription (AMT) is one of the oldest and most well-studied problems in the field of music information retrieval. Within this challenging research field, onset detection and instrument recognition take important places in transcription systems, as they respectively help to determine exact onset times of notes and to recognize the corresponding instrument sources. The aim of this study is to explore the usefulness of multiscale scattering operators for these two tasks on plucked string instrument and piano music. After resuming the theoretical background and illustrating the key features of this sound representation method, we evaluate its performances comparatively to other classical sound representations. Using both MIDI-driven datasets with real instrument samples and real musical pieces, scattering is proved to outperform other sound representations for these AMT subtasks, putting forward its richer sound representation and invariance properties.

[1]  M. Davies,et al.  Complex domain onset detection for musical signals , 2003 .

[2]  Joakim Andén,et al.  SCATTERING REPRESENTATION OF MODULATED SOUNDS , 2012 .

[3]  Yi-Ping Hung,et al.  Applying scattering operators for face recognition: A comparative study , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[4]  Aníbal Ferreira,et al.  Measuring music transcription results based on a hybrid decay/sustain evaluation , 2009 .

[5]  Steve Lawrence,et al.  Artist detection in music with Minnowmatch , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[6]  Laurent Daudet,et al.  Transients modelling by pruned wavelet trees , 2001, ICMC.

[7]  Roland Badeau,et al.  Blind Harmonic Adaptive Decomposition applied to supervised source separation , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[8]  Jaakko Astola,et al.  Analysis of the meter of acoustic musical signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  David Mumford,et al.  Communications on Pure and Applied Mathematics , 1989 .

[10]  Diemo Schwarz New Developments in Data-Driven Concatenative Sound Synthesis , 2003, ICMC.

[11]  Guodong Guo,et al.  Boosting for content-based audio classification and retrieval: an evaluation , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[12]  Sebastian Ewert,et al.  The Audio Degradation Toolbox and Its Application to Robustness Evaluation , 2013, ISMIR.

[13]  Mert Bay,et al.  Evaluation of Multiple-F0 Estimation and Tracking Systems , 2009, ISMIR.

[14]  Gaël Richard,et al.  On the Usefulness of Differentiated Transient/Steady-state Processing in Machine Recognition of Musical Instruments , 2005 .

[15]  Joakim Andén,et al.  Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.

[16]  Melville Clark,et al.  A Preliminary Experiment on the Perceptual Basis for Musical Instrument Families , 1964 .

[17]  Douglas Eck,et al.  Aggregate features and ADABOOST for music classification , 2006, Machine Learning.

[18]  Xu Chen,et al.  Music genre classification using multiscale scattering and sparse representations , 2013, 2013 47th Annual Conference on Information Sciences and Systems (CISS).

[19]  Mark B. Sandler,et al.  On the use of phase and energy for musical onset detection in the complex domain , 2004, IEEE Signal Processing Letters.

[20]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  Joakim Andén,et al.  Multiscale Scattering for Audio Classification , 2011, ISMIR.

[23]  Anssi Klapuri,et al.  Sound onset detection by applying psychoacoustic knowledge , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[24]  Joakim Andén,et al.  Representing environmental sounds using the separable scattering transform , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Anssi Klapuri,et al.  Automatic Music Transcription as We Know it Today , 2004 .

[26]  Valentin Emiya Transcription automatique de la musique de piano , 2008 .

[27]  Richard Kronland-Martinet,et al.  Analysis of Sound Patterns through Wavelet transforms , 1987, Int. J. Pattern Recognit. Artif. Intell..

[28]  M. Mathews,et al.  Analysis of musical‐instrument tones , 1969 .

[29]  Masataka Goto,et al.  Instrogram: A New Musical Instrument Recognition Technique Without Using Onset Detection NOR F0 Estimation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[30]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[31]  Janet Marques,et al.  An automatic annotation system for audio data containing music , 1999 .

[32]  Perfecto Herrera-Boyer,et al.  Automatic Classification of Musical Instrument Sounds , 2003 .

[33]  Guodong Guo,et al.  Content-Based Audio Classification and Retrieval Using SVM Learning , 2000 .

[34]  Patrick Susini,et al.  The Timbre Toolbox: extracting audio descriptors from musical signals. , 2011, The Journal of the Acoustical Society of America.

[35]  Paul Masri,et al.  Imroved Modelling of Attack Transients in Music Analysis-Resynthesis , 1996, ICMC.

[36]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[37]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[38]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[39]  George Tzanetakis,et al.  Audio Analysis using the Discrete Wavelet Transform , 2001 .

[40]  Tillman Weyde,et al.  An efficient shift-invariant model for polyphonic music transcription , 2013 .

[41]  Nick Collins Using a Pitch Detector for Onset Detection , 2005, ISMIR.

[42]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[43]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[44]  Joakim Andén,et al.  Scattering Transform for Intrapartum Fetal Heart Rate Variability Fractal Analysis: A Case-Control Study , 2014, IEEE Transactions on Biomedical Engineering.

[45]  R. Jackendoff,et al.  A Generative Theory of Tonal Music , 1985 .

[46]  S. McAdams,et al.  Auditory Cognition. (Book Reviews: Thinking in Sound. The Cognitive Psychology of Human Audition.) , 1993 .

[47]  Mark D. Plumbley,et al.  Fast labelling of notes in music signals , 2004, ISMIR.

[48]  Robert B. Cantrick,et al.  A Generative Theory of Tonal Music , 1985 .

[49]  S. Dixon ONSET DETECTION REVISITED , 2006 .

[50]  Guy J. Brown,et al.  Instrument recognition in accompanied sonatas and concertos , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.