Source and system features for text independent speaker identification using iterative clustering approach

The main objective of this paper is to explore the effectiveness of perceptual features combined with pitch for text independent speaker recognition. The proposed combined features are captured and training models are developed by K-means clustering procedure. Speaker recognition system is evaluated on clean test speeches and the experimental results reveal the performance of the proposed algorithm in performing speaker recognition based on minimum distance between test features and clusters. This algorithm gives the overall accuracy of 99.675% and 98.75% for the combined features and perceptual features respectively for identifying speaker among 8 speakers chosen randomly from 8 different dialect regions in “TIMIT” database. It also gives average accuracy of 96.375% and 95.625% for perceptual linear predictive cepstrum combined with pitch and perceptual linear predictive cepstrum respectively for 8 speakers chosen randomly from the same dialect region. The noteworthy feature of speaker identification algorithm is to evaluate the testing procedure on identical messages for all speakers. In this work, Fratio is computed as a theoretical measure to validate the experimental results on speaker recognition.

[1]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Y. Venkataramani,et al.  Text Independent Composite Speaker Identification/Verification Using Multiple Features , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[4]  Y. Venkataramani,et al.  Effectiveness of LP Derived Features and DCTC in Twins Identification - Iterative Speaker Clustering Approach , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[5]  Hynek Hermansky,et al.  The challenge of inverse-E: the RASTA-PLP method , 1991, [1991] Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems & Computers.

[6]  Y. Venkataramani,et al.  Use of perceptual features in iterative clustering based twins identification system , 2008, 2008 International Conference on Computing, Communication and Networking.

[7]  S. Arivazhagan,et al.  Fingerprint Verification Using Gabor Co-occurrence Features , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[8]  Jitendra Padhye,et al.  A Stochastic Model of TCP Reno Congestion Avoidence and Control , 1999 .

[9]  Y. Venkataramani,et al.  Iterative Clustering Approach for Text Independent Speaker Identification using Multiple Features , 2008, 2008 2nd International Conference on Signal Processing and Communication Systems.

[10]  Kishore Prahallad,et al.  Source and system features for speaker recognition using AANN models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Goutam Saha,et al.  Improved Text-Independent Speaker Identification using Fused MFCC and IMFCC Feature Sets based on Gaussian Filter , 2009 .

[12]  牧野 正三 Perceptually based processing in automatic speech recognition , 1986 .

[13]  Danoush Hosseinzadeh,et al.  Combining Vocal Source and MFCC Features for Enhanced Speaker Recognition Performance Using GMMs , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[14]  Rong Zheng,et al.  Improvement of Speaker Identification by Combining Prosodic Features with Acoustic Features , 2004, SINOBIOMETRICS.