Singing Pitch Extraction from Monaural Polyphonic Songs by Contextual Audio Modeling and Singing Harmonic Enhancement

This paper proposes a novel approach to extract the pitches of singing voices from monaural polyphonic songs. The hidden Markov model (HMM) is adopted to model the transition between adjacent singing pitches in time, and the relationships between melody and its chord, which is implicitly represented by features extracted from the spectrum. Moreover, another set of features which represents the energy distribution of the enhanced singing harmonic structure is proposed by applying a normalized sub-harmonic summation technique. By using these two feature sets with complementary characteristics, a 2stream HMM is constructed for singing pitch extraction. Quantitative evaluation shows that the proposed system outperforms the compared approaches for singing pitch extraction from polyphonic songs.

[1]  DeLiang Wang,et al.  Detecting pitch of singing voice in polyphonic audio , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[3]  Karin Dressler An Auditory Streaming Approach on Melody Extraction , 2006 .

[4]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[6]  Daniel P. W. Ellis,et al.  A Classification Approach to Melody Transcription , 2005, ISMIR.

[7]  Preeti Rao,et al.  MELODY EXTRACTION USING HARMONIC MATCHING , 2008 .

[8]  Anssi Klapuri,et al.  Transcription of the Singing Melody in Polyphonic Music , 2006, ISMIR.

[9]  Jian Liu,et al.  Singing Melody Extraction in Polyphonic Music by Harmonic Tracking , 2007, ISMIR.

[10]  Hiromasa Fujihara,et al.  Automatic Synchronization between Lyrics and Music CD Recordings Based on Viterbi Alignment of Segregated Vocal Signals , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[11]  Geoffroy Peeters,et al.  Singing voice detection in music tracks using direct voice vibrato detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Gaël Richard,et al.  An iterative approach to monaural musical mixture de-soloing , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.