A tutorial on onset detection in music signals

Note onset detection and localization is useful in a number of analysis and indexing techniques for musical signals. The usual way to detect onsets is to look for "transient" regions in the signal, a notion that leads to many definitions: a sudden burst of energy, a change in the short-time spectrum of the signal or in the statistical properties, etc. The goal of this paper is to review, categorize, and compare some of the most commonly used techniques for onset detection, and to present possible enhancements. We discuss methods based on the use of explicitly predefined signal features: the signal's amplitude envelope, spectral magnitudes and phases, time-frequency representations; and methods based on probabilistic signal models: model-based change point detection, surprise signals, etc. Using a choice of test cases, we provide some guidelines for choosing the appropriate method for a given application.

[1]  B. Moore An introduction to the psychology of hearing (5th ed.). , 1989 .

[2]  Michèle Basseville,et al.  Sequential detection of abrupt changes in spectral characteristics of digital signals , 1983, IEEE Trans. Inf. Theory.

[3]  I. Kauppinen,et al.  Methods for detecting impulsive noise in speech and audio signals , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[4]  Mark B. Sandler,et al.  Phase-based note onset detection for music signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  J. Cardoso Infomax and maximum likelihood for blind source separation , 1997, IEEE Signal Processing Letters.

[6]  Mark B. Sandler,et al.  On the use of phase and energy for musical onset detection in the complex domain , 2004, IEEE Signal Processing Letters.

[7]  Xavier Serra,et al.  A sound analysis/synthesis system based on a deterministic plus stochastic decomposition , 1990 .

[8]  Yoshitaka Nakajima,et al.  Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[9]  Matthew E. P. Davies,et al.  A Combined Phase and Amplitude Based Approach to Onset Detection for Audio Segmentation , 2003 .

[10]  Thomas Baer,et al.  A model for the prediction of thresholds, loudness, and partial loudness , 1997 .

[11]  M. Davies,et al.  A HYBRID APPROACH TO MUSICAL NOTE ONSET DETECTION , 2002 .

[12]  Simon J. Godsill,et al.  Detection of abrupt spectral changes using support vector machines an application to audio signal segmentation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Yoichi Muraoka,et al.  Beat Tracking based on Multiple-agent Architecture A Real-time Beat Tracking System for Audio Signals w , 1996 .

[14]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[15]  Julius O. Smith,et al.  Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .

[16]  Marcus Purat,et al.  Audio coding with a dynamic wavelet packet decomposition based on frequency-varying modulated lapped transforms , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  Xavier Rodet,et al.  Detection and modeling of fast attack transients , 2001, ICMC.

[18]  Mark D. Plumbley,et al.  PROBABILITY AS METADATA: EVENT DETECTION IN MUSIC USING ICA AS A CONDITIONAL DENSITY MODEL , 2003 .

[19]  Zack Settel,et al.  Realtime Musical Applications using FFT based Resynthesis , 1994, ICMC.

[20]  Laurent Daudet,et al.  Transients modelling by pruned wavelet trees , 2001, ICMC.

[21]  Mark Dolson,et al.  The Phase Vocoder: A Tutorial , 1986 .

[22]  Teresa H. Y. Meng,et al.  Transient Modeling Synthesis: a flexible analysis/synthesis tool for transient signals , 1998, ICMC.

[23]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[24]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[25]  Anssi Klapuri,et al.  Sound onset detection by applying psychoacoustic knowledge , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[26]  Fabien Gouyon,et al.  A Flexible Analysis-Synthesis Method for Transients , 2000, ICMC.

[27]  Mike E. Davies,et al.  Improved Time-Scaling of Musical Audio Using Phase Locking at Transients , 2002 .

[28]  Seymour Shlien,et al.  The modulated lapped transform, its time-varying forms, and its applications to audio coding standards , 1997, IEEE Trans. Speech Audio Process..

[29]  John Princen,et al.  Analysis/Synthesis filter bank design based on time domain aliasing cancellation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[30]  W. Andrew Schloss,et al.  On the automatic transcription of percussive music , 1985 .

[31]  Bruno Torrésani,et al.  Hybrid representations for audiophonic signal encoding , 2002, Signal Process..

[32]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[33]  Julius O. Smith,et al.  Audio representations for data compression and compressed domain processing , 1998 .

[34]  E. Owens Introduction to the Psychology of Hearing , 1977 .

[35]  Mark D. Plumbley,et al.  Unsupervised onset detection : A probabilistic approach using ICA and a hidden Markov classifier , 2003 .

[36]  Teresa H. Y. Meng,et al.  Sinusoidal modeling using frame-based perceptually weighted matching pursuits , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[37]  Achim V. Brandt,et al.  Detecting and estimating parameter jumps using ladder algorithms and likelihood ratio tests , 1983, ICASSP.