Speaker Identification by Combining Various Vocal Tract and Vocal Source Features

Previously, we proposed a speaker recognition system using a combination of MFCC-based vocal tract feature and phase information which includes rich vocal source information. In this paper, we investigate the efficiency of combination of various vocal tract features (MFCC and LPCC) and vocal source features (phase and LPC residual) for normal-duration and short-duration utterance. The Japanese Newspaper Article Sentence (JNAS) database was used to evaluate our proposed method. The combination of various vocal tract and vocal source features achieved remarkable improvement than the conventional MFCC-based vocal tract feature for both normal-duration and short-duration utterances.

[1]  Shuichi Itahashi,et al.  JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research , 1999 .

[2]  Nengheng Zheng,et al.  Integration of Complementary Acoustic Features for Speaker Recognition , 2007, IEEE Signal Processing Letters.

[3]  Longbiao Wang,et al.  Speaker recognition by combining MFCC and phase information , 2010, INTERSPEECH.

[4]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[5]  Seiichi Nakagawa,et al.  Speaker Identification Using Pseudo Pitch Synchronized Phase Information in Voiced Sound , 2011 .

[6]  Longbiao Wang,et al.  Robust distant speaker recognition based on position dependent cepstral mean normalization , 2005, INTERSPEECH.

[7]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[8]  Longbiao Wang,et al.  On the use of phase information-based joint factor analysis for speaker verification under channel mismatch condition , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[9]  Longbiao Wang,et al.  Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM , 2007, Speech Commun..

[10]  Rajesh M. Hegde,et al.  Application of the modified group delay function to speaker identification and discrimination , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[12]  Eliathamby Ambikairajah,et al.  LS regularization of group delay features for speaker recognition , 2009, INTERSPEECH.

[13]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[14]  Konstantin Markov,et al.  Integrating pitch and LPC-residual information with LPC-cepstrum for text-independent speaker recognition , 1999 .

[15]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[16]  Longbiao Wang,et al.  Speaker identification by combining MFCC and phase information in noisy environments , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Longbiao Wang,et al.  High improvement of speaker identification and verification by combining MFCC and phase information , 2009, ICASSP.

[18]  Longbiao Wang,et al.  Speaker identification using pseudo pitch synchronized phase information in noisy environments , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[19]  Longbiao Wang,et al.  Speaker Identification and Verification by Combining MFCC and Phase Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Sree Hari Krishnan Parthasarathi,et al.  Robustness of phase based features for speaker recognition , 2009, INTERSPEECH.

[21]  Seiichi Nakagawa,et al.  PAPER Special Section on Processing Natural Speech Variability for Improved Verbal Human-Computer Interaction Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions , 2010 .