Significance of Phase-based Features for Person Recognition Using Humming

This paper presents use of hum of a person as a biometric cue for person recognition task. Mel Frequency Cepstral Coefficients (MFCC) is found to be state-of-the-art in the voice biometrics. However, it is magnitude-based features and ignores the phase information. This paper shows the effectiveness of phase-based information extracted via Modified Group Delay Function (MODGDF). The features developed by Mel filtering of MODGDF spectrum are called Modified Group Delay Cepstral Coefficients (MGDCC). The paper demonstrates two types of fusion strategies, viz., score-level and feature-level. The experimental results show that overall performance is improved by 3 % if a score-level fusion is employed between MFCC and MGDCC and 19.78 % by feature-level fusion in terms of % Equal Error Rate (EER). These experimental results clearly indicate that incorporating phase information along with magnitude-based features can effectively captures person-specific characteristics in humming.

[1]  Kuldip K. Paliwal,et al.  Product of power spectrum and group delay function for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Alvin F. Martin,et al.  The NIST 1999 Speaker Recognition Evaluation - An Overview , 2000, Digit. Signal Process..

[3]  Hemant A. Patil,et al.  Significance of magnitude and phase information via VTEO for humming based biometrics , 2012, 2012 5th IAPR International Conference on Biometrics (ICB).

[4]  Keshab K. Parhi,et al.  Combining Evidence from Spectral and Source-Like Features for Person Recognition from Humming , 2011, INTERSPEECH.

[5]  Rajesh M. Hegde,et al.  Application of the modified group delay function to speaker identification and discrimination , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Kuldip K. Paliwal,et al.  Short-time phase spectrum in speech processing: A review and some experimental results , 2007, Digit. Signal Process..

[7]  Hema A. Murthy,et al.  The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Kanae Amino,et al.  Speaker-dependent characteristics of the nasals. , 2009, Forensic science international.

[9]  William M. Campbell,et al.  Speaker recognition with polynomial classifiers , 2002, IEEE Trans. Speech Audio Process..

[10]  Jaewook Kim,et al.  Humming-based human verification and identification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Hemant A. Patil,et al.  Identification of Speakers from Their Hum , 2008, TSD.

[13]  Keshab K. Parhi,et al.  Novel Variable length Teager Energy Based features for person recognition from their hum , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  A. Oppenheim,et al.  Signal reconstruction from phase or magnitude , 1980 .

[15]  Rajesh M. Hegde,et al.  Significance of the Modified Group Delay Feature in Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.