A post-processing of onset detection based on verification with neural network

Onset detection is the primary task of music transcription that aims to find the start time of each note, which directly associated with the beats perception in the auditory system. Researchers attempted to find a data representation of universal onset function. However, the onset detection would not generalize to all cases. For example, onset detection in solo singing has a lower performance than solo playing the instrument in MIREX challenge every year. This paper presents a post-processing step to singing onset detection that solely reduces false detected onsets. In the post-processing step, the system checks the onsets picked from local maximums of onset function, and uses the neural network model to discern onset or non-onset feature rather than consider a complicated onset function. The performance of the network has a close relationship to the onset detection. In the public dataset about the research of singing transcription, the pipeline with post-processing presents a higher performance than the standard and novelty method, when it was focused on the onsets, that it reduces false alarms from feature methods. It can provide further supports for the research of singing transcription when the data-driven approach provided an effective method to eliminate spurious peaks, which can be the state-of-art of singing onset detection.

[1]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[2]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[3]  Yannis Stylianou,et al.  Three Dimensions of Pitched Instrument Onset Detection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[5]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[6]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[7]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[8]  Hoon Heo,et al.  Robust Singing Transcription System Using Local Homogeneity in the Harmonic Structure , 2017, IEICE Trans. Inf. Syst..

[9]  Ian H. Witten,et al.  Signal processing for melody transcription , 1995 .

[10]  Ian McLoughlin,et al.  What makes audio event detection harder than classification? , 2016, 2017 25th European Signal Processing Conference (EUSIPCO).

[11]  Gerhard Widmer Enhanced peak picking for onset detection with recurrent neural networks , 2013 .

[12]  Meinard Müller,et al.  Fundamentals of Music Processing , 2015, Springer International Publishing.

[13]  Mohan S. Kankanhalli,et al.  Pitch Tracking and Melody Slope Matching for Song Retrieval , 2001, IEEE Pacific Rim Conference on Multimedia.

[14]  Marco Mattavelli,et al.  Music Onset Detection Based on Resonator Time Frequency Image , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Dong Yu,et al.  Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[16]  Gerhard Widmer,et al.  Local Group Delay Based Vibrato and Tremolo Suppression for Onset Detection , 2013, ISMIR.

[17]  Florian Eyben,et al.  MIREX 2010 SUBMISSION: ONSET DETECTION WITH BIDIRECTIONAL LONG SHORT-TERM MEMORY NEURAL NETWORKS , 2010 .

[18]  Emilia Gómez,et al.  Tonal representations for music retrieval: from version identification to query-by-humming , 2012, International Journal of Multimedia Information Retrieval.

[19]  Simon Dixon,et al.  PYIN: A fundamental frequency estimator using probabilistic threshold distributions , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Sebastian Böck,et al.  Improved musical onset detection with Convolutional Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Tieniu Tan,et al.  Semantic windows mining in sliding window based object detection , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[22]  Florian Krebs,et al.  ONLINE REAL-TIME ONSET DETECTION WITH RECURRENT NEURAL NETWORKS , 2012 .

[23]  Emilio Molina,et al.  SiPTH: Singing Transcription Based on Hysteresis Defined on the Pitch-Time Curve , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[25]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[26]  Björn W. Schuller,et al.  Universal Onset Detection with Bidirectional Long Short-Term Memory Neural Networks , 2010, ISMIR.