Group delay based melody monopitch extraction from music

In this paper, we propose a modified group delay based method for melodic pitch extraction from heterophonic music. The power spectrum of the music signal is first flattened in order that the system characteristics are annihilated, while the characteristics of the source are emphasized. The modified group delay function of this signal produces peaks at multiples of the pitch period. The first 3 peaks are used to determine the actual pitch period. The performance of the proposed system was evaluated on two datasets ADC-2004, and LabROSA. The performance is comparable to that of other magnitude spectrum based approaches. The algorithms are also applied to heterophonic music, namely Carnatic Music. As ground truth is not available for Carnatic Music, the pitch contours were used to synthesize the music, which was evaluated for correctness by a professional musician.

[1]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Gaël Richard,et al.  Singer melody extraction in polyphonic signals using source separation methods , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Gaël Richard,et al.  Melody Extraction from Polyphonic Music Signals , 2014 .

[4]  Bayya Yegnanarayana,et al.  Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..

[5]  B. Yegnanarayana,et al.  Processing of noisy speech using modified group delay functions , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Jyh-Shing Roger Jang,et al.  Singing Pitch Extraction by Voice Vibrato / Tremolo Estimation and Instrument Partial Deletion , 2010, ISMIR.

[7]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[8]  Shigeki Sagayama,et al.  Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[10]  Amílcar Cardoso,et al.  Melody Detection in Polyphonic Musical Signals: Exploiting Perceptual Rules, Note Salience, and Melodic Smoothness , 2006, Computer Music Journal.

[11]  Emilia Gómez,et al.  Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Bayya Yegnanarayana,et al.  Determination of instants of significant excitation in speech using group delay function , 1995, IEEE Trans. Speech Audio Process..

[13]  José Tribolet,et al.  A new phase unwrapping algorithm , 1977 .

[14]  Seokhwan Jo,et al.  Melody pitch estimation based on range estimation and candidate extraction using harmonic structure model , 2010, INTERSPEECH.

[15]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[16]  HEMA A MURTHY,et al.  Group delay functions and its applications in speech technology , 2011 .

[17]  Chang D. Yoo,et al.  MELODY EXTRACTION FROM POLYPHONIC AUDIO SIGNAL MIREX 2009 , 2009 .

[18]  Masataka Goto,et al.  A Real-time Music Scene Description System: Detecting Melody and Bass Lines in Audio Signals , 1999 .

[19]  Jian Liu,et al.  Singing Melody Extraction in Polyphonic Music by Harmonic Tracking , 2007, ISMIR.

[20]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[21]  Yu Tian,et al.  Extracting singing melody in music with accompaniment based on harmonic peak and subharmonic summation , 2011 .