Musical onset detection on carnatic percussion instruments

In this work, we explore the task of musical onset detection in Carnatic music by choosing five major percussion instruments: the mridangam, ghatam, kanjira, morsing and thavil. We explore the musical characteristics of the strokes for each of the above instruments, motivating the challenge in designing an onset detection algorithm. We propose a non-model based algorithm using the minimum-phase group delay for this task. The music signal is treated as an Amplitude-Frequency modulated (AM-FM) waveform, and its envelope is extracted using the Hilbert transform. Minimum phase group delay processing is then applied to accurately determine the onset locations. The algorithm is tested on a large dataset with both controlled and concert recordings (tani avarthanams). The performance is observed to be the comparable with that of the state-of-the-art technique employing machine learning algorithms.

[1]  Rajesh M. Hegde,et al.  Significance of the Modified Group Delay Feature in Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Hema A. Murthy,et al.  Modal analysis and transcription of strokes of the mridangam using non-negative matrix factorization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Erik Marchi,et al.  Audio onset detection: A wavelet packet based approach with recurrent neural networks , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[4]  Francesco Piazza,et al.  Adaptive Linear Prediction Filtering in DWT Domain for Real-Time Musical Onset Detection , 2011, EURASIP J. Adv. Signal Process..

[5]  Sree Hari Krishnan Parthasarathi,et al.  Robustness of phase based features for speaker recognition , 2009, INTERSPEECH.

[6]  C.-C. Jay Kuo,et al.  Musical Onset Detection Based on Adaptive Linear Prediction , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[7]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .

[8]  Sree Hari Krishnan Parthasarathi,et al.  Robustness of group delay representations for noisy speech signals , 2011, Int. J. Speech Technol..

[9]  W. Andrew Schloss,et al.  On the automatic transcription of percussive music , 1985 .

[10]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[11]  Hema A. Murthy,et al.  Automatic segmentation of continuous speech using minimum phase group delay functions , 2004, Speech Commun..

[12]  Hema A. Murthy,et al.  Minimum phase signal derived from root cepstrum , 2003 .

[13]  Mark B. Sandler,et al.  On the use of phase and energy for musical onset detection in the complex domain , 2004, IEEE Signal Processing Letters.

[14]  Florian Krebs,et al.  Evaluating the Online Capabilities of Onset Detection Methods , 2012, ISMIR.

[15]  S. Handel,et al.  Listening: An Introduction to the Perception of Auditory Events , 1993 .

[16]  Gerhard Widmer,et al.  Local Group Delay Based Vibrato and Tremolo Suppression for Onset Detection , 2013, ISMIR.

[17]  Mark B. Sandler,et al.  Phase-based note onset detection for music signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[18]  Rajesh M. Hegde,et al.  Segmentation of speech into syllable-like units , 2003, INTERSPEECH.

[19]  Dennis Gabor,et al.  Theory of communication , 1946 .

[20]  Sebastian Böck,et al.  Improved musical onset detection with Convolutional Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  M. Davies,et al.  A COMPARISON BETWEEN FIXED AND MULTIRESOLUTION ANALYSIS FOR ONSET DETECTION IN MUSICAL SIGNALS , 2004 .

[22]  Yannis Stylianou,et al.  Three Dimensions of Pitched Instrument Onset Detection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  C. V. Raman,et al.  The Indian musical drums , 1934 .

[24]  Erik Marchi,et al.  Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Hema A. Murthy,et al.  Group delay based phone segmentation for HTS , 2014, 2014 Twentieth National Conference on Communications (NCC).

[26]  Björn W. Schuller,et al.  Universal Onset Detection with Bidirectional Long Short-Term Memory Neural Networks , 2010, ISMIR.

[27]  Yannis Stylianou,et al.  Auditory Spectrum-Based Pitched Instrument Onset Detection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Hema A. Murthy,et al.  A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation , 2014, INTERSPEECH.

[29]  S. Dixon ONSET DETECTION REVISITED , 2006 .

[30]  G. Widmer,et al.  MAXIMUM FILTER VIBRATO SUPPRESSION FOR ONSET DETECTION , 2013 .

[31]  Yoichi Muraoka,et al.  Beat Tracking based on Multiple-agent Architecture A Real-time Beat Tracking System for Audio Signals w , 1996 .