Cluster and Intrinsic Dimensionality Analysis of the Modified Group Delay Feature for Speaker Classification

Speakers are generally identified by using features derived from the Fourier transform magnitude. The Modified group delay feature(MODGDF) derived from the Fourier transform phase has been used effectively for speaker recognition in our previous efforts.Although the efficacy of the MODGDF as an alternative to the MFCC is yet to be established, it has been shown in our earlier work that composite features derived from the MFCC and MODGDF perform extremely well. In this paper we investigate the cluster structures of speakers derived using the MODGDF in the lower dimensional feature space. Three non linear dimensionality reduction techniques The Sammon mapping, ISOMAP and LLE are used to visualize speaker clusters in the lower dimensional feature space. We identify the intrinsic dimensionality of both the MODGDF and MFCC using the Elbow technique. We also present the results of speaker identification experiments performed using MODGDF, MFCC and composite features derived from the MODGDF and MFCC.

[1]  Timothy A. Gonsalves,et al.  Linear Prediction For Network Management , 2000 .

[2]  Rajesh M. Hegde,et al.  Application of the modified group delay function to speaker identification and discrimination , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[4]  Hema A. Murthy,et al.  Text-to-Speech Synthesis using syllable-like units , .

[5]  Hema A. Murthy,et al.  Group delay based segmentation of spontaneous speech into syllable-like units , 2003 .

[6]  Hema A. Murthy,et al.  A pattern recognition approach to VAD using modified group delay , .

[7]  Narendranath M,et al.  Transformation of Vocal Tract Characteristics for Voice Conversion using Artificial Neural Networks , 2006 .

[8]  Timothy A. Gonsalves,et al.  Detection of Syn Flooding Attacks using Linear Prediction Analysis , 2006, 2006 14th IEEE International Conference on Networks.

[9]  Lukas Burget,et al.  Distributed Speech Recognition , 2002 .

[10]  Sara H. Basson,et al.  NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[11]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[12]  H.A. Murthy,et al.  Automatic language identification and discrimination using the modified group delay feature , 2005, Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005..

[13]  Anil Prabhakar,et al.  Automatic identification of bird calls using Spectral Ensemble Average Voice Prints , 2006, 2006 14th European Signal Processing Conference.

[14]  R. Padmanabhan,et al.  Robust Voice Activity Detection using Group Delay Functions , 2006, 2006 IEEE International Conference on Industrial Technology.

[15]  Timothy A. Gonsalves,et al.  Traffic Modeling and Classification Using Packet Train Length and Packet Train Size , 2006, IPOM.

[16]  Rajesh M. Hegde,et al.  Speech processing using joint features derived from the modified group delay function , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[17]  Timothy A. Gonsalves,et al.  Network Security Management for a National ISP , 2005 .

[18]  Hema A. Murthy,et al.  Connected Digit Recognition using Minimum Phase Group Delay Functions , 1999 .

[19]  Hema A Murthy,et al.  A syllable-based segment vocoder , 2022 .

[20]  Hema A. Murthy,et al.  Language identification using acoustic log-likelihoods of syllable-like units , 2006, Speech Commun..

[21]  C. S. Ramalingam,et al.  MULTIMODAL INTERFACES TO THE COMPUTER , 2022 .

[22]  Bayya Yegnanarayana,et al.  Transformation of formants for voice conversion using artificial neural networks , 1995, Speech Commun..

[23]  Murzban D. Jhabvala,et al.  Visual aid for the hearing impaired , 1991 .

[24]  Bayya Yegnanarayana,et al.  Formant extraction from group delay function , 1991, Speech Commun..

[25]  Larry P. Heck,et al.  Robust text-independent speaker identification over telephone channels , 1999, IEEE Trans. Speech Audio Process..

[26]  A S Madhukumar,et al.  Significance of knowledge sources for a text-to-speech system for Indian languages , 1994 .

[27]  Hema A. Murthy,et al.  Automatic segmentation of continuous speech using minimum phase group delay functions , 2004, Speech Commun..

[28]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[29]  Bayya Yegnanarayana,et al.  Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..

[30]  Hema A. Murthy,et al.  The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[31]  Hema A. Murthy,et al.  A syllable based continuous speech recognizer for Tamil , 2006, INTERSPEECH.

[32]  Zhao Jun Distributed Intrusion Detection System , 2006 .

[33]  Rajesh M. Hegde,et al.  Speaker Identification using the modified group delay feature , 2003 .

[34]  B. Yegnanarayana,et al.  Applications of Group Delay Functions in Speech Processing , 1988 .

[35]  Hema A. Murthy,et al.  Minimum phase signal derived from root cepstrum , 2003 .

[36]  Alan W. Black,et al.  Multilingual text-to-speech synthesis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  Timothy A. Gonsalves,et al.  LOW COST DATA COMUNICATION NETWORK FOR RURAL TELECOM MANAGEMENT , 2005 .