Two-pitch tracking in co-channel speech using modified group delay functions

Modified group delay functions are beginning to gain significance in the literature for formant estimation, speaker recognition and speech recognition. In particular, group delay functions have the property that they possess higher resolution compared to that of the magnitude spectrum. In this paper, modified group delay functions are used for the estimation and tracking of two pitches in concurrent speech. The power spectrum of the speech signal is first flattened to annihilate the system characteristics, while retaining the source characteristics. Group delay analysis of the flattened spectrum is performed and the predominant pitch is computed. Next, a comb filter is designed to remove the predominant pitch and its harmonics from the group delay spectrum. The residual spectrum is again subjected to group delay analysis and the next candidate pitch is again estimated using modified group delay processing. The first and second pass pitch trajectories are corrected using post processing. The performance of the proposed algorithm was evaluated on two datasets using two metrics; pitch accuracy and standard deviation of fine pitch error. Our results show that phase based processing holds promise in the context of multipitch estimation.

[1]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[2]  Hirokazu Kameoka,et al.  A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  M. Schroeder Period histogram and product spectrum: new methods for fundamental-frequency measurement. , 1968, The Journal of the Acoustical Society of America.

[4]  Hirokazu Kameoka,et al.  Multi-pitch trajectory estimation of concurrent speech based on harmonic GMM and nonlinear kalman filtering , 2004, INTERSPEECH.

[5]  David Gerhard,et al.  Pitch Extraction and Fundamental Frequency: History and Current Techniques , 2003 .

[6]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[7]  Sree Hari Krishnan Parthasarathi,et al.  Robustness of group delay representations for noisy speech signals , 2011, Int. J. Speech Technol..

[8]  Hema A. Murthy,et al.  Modified group delay feature based total variability space modelling for speaker recognition , 2015, Int. J. Speech Technol..

[9]  Bayya Yegnanarayana,et al.  Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..

[10]  Hema A. Murthy,et al.  The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  B. Yegnanarayana,et al.  Significance of group delay functions in signal reconstruction from spectral magnitude or phase , 1984 .

[12]  Feng Huang,et al.  Pitch Estimation in Noisy Speech Using Accumulated Peak Spectrum and Sparse Estimation Technique , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Aaron E. Rosenberg,et al.  A comparative performance study of several pitch detection algorithms , 1976 .

[14]  M. Gibson,et al.  The Simple4All entry to the Blizzard Challenge 2014 , 2013 .

[15]  Rajesh M. Hegde,et al.  Significance of the Modified Group Delay Feature in Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Alain de Cheveigné,et al.  Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancell , 1993 .

[17]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[18]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[19]  Anssi Klapuri,et al.  Automatic Transcription of Melody, Bass Line, and Chords in Polyphonic Music , 2008, Computer Music Journal.

[20]  Andreas Jakobsson,et al.  Multi-Pitch Estimation , 2009, Multi-Pitch Estimation.

[21]  Franz Pernkopf,et al.  A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario , 2011, INTERSPEECH.

[22]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[23]  DeLiang Wang,et al.  HMM-Based Multipitch Tracking for Noisy and Reverberant Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Hema A. Murthy,et al.  Group delay based melody monopitch extraction from music , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Preeti Rao,et al.  Singing Voice Detection in North Indian Classical Music , 2008 .

[26]  Hema A. Murthy,et al.  Melodic pitch extraction from music signals using modified group delay functions , 2013, 2013 National Conference on Communications (NCC).

[27]  Tuomas Virtanen,et al.  Unsupervised Learning Methods for Source Separation in Monaural Music Signals , 2006 .

[28]  Richard M. Dansereau,et al.  MPtracker: A new multi-pitch detection and separation algorithm for mixed speech signals , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Qin Jin,et al.  Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring , 2010, Odyssey.

[30]  Xin Liu,et al.  Speech Enhancement Using Harmonic Emphasis and Adaptive Comb Filtering , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Anssi Klapuri,et al.  Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[33]  HEMA A MURTHY,et al.  Group delay functions and its applications in speech technology , 2011 .

[34]  Kunio Kashino,et al.  A Sound Source Separation System with the Ability of Automatic Tone Modeling , 1993, International Conference on Mathematics and Computing.

[35]  Emilia Gómez,et al.  Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  R. Badeau,et al.  Multipitch estimation of quasi-harmonic sounds in colored noise , 2007 .

[37]  Qiang Fu,et al.  Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping , 2008, INTERSPEECH.

[38]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[39]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.