An overview of text-independent speaker recognition: From features to supervectors

This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.

[1]  Larry P. Heck,et al.  Handset-dependent background models for robust text-independent speaker recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[3]  Paavo Alku,et al.  On separating glottal source and vocal tract information in telephony speaker verification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  André Adami,et al.  Modeling prosodic differences for speaker recognition , 2007, Speech Commun..

[5]  Kaliappan Gopalan,et al.  A comparison of speaker identification results using features based on cepstrum and Fourier-Bessel expansion , 1999, IEEE Trans. Speech Audio Process..

[6]  Wai Nang Chan,et al.  Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Eric G. Hansen,et al.  Speaker recognition using phoneme-specific GMMs , 2004, Odyssey.

[8]  Richard J. Mammone,et al.  An analysis of data fusion methods for speaker verification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  T.F. Quatieri,et al.  Speaker recognition from coded speech and the effects of score normalization , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[10]  J. Wolf Efficient Acoustic Parameters for Speaker Recognition , 1972 .

[11]  Steve Young,et al.  Corpus-based methods in language and speech processing , 1997 .

[12]  Christian M. Ller Speaker Classification , 2008, Encyclopedia of Biometrics.

[13]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[14]  Thomas Fang Zheng,et al.  A tree-based kernel selection approach to efficient Gaussian mixture model-universal background model based speaker identification , 2006, Speech Commun..

[15]  Elizabeth Shriberg,et al.  Parameterization of Prosodic Feature Distributions for SVM Modeling in Speaker Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  Les E. Atlas,et al.  EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .

[17]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[18]  Rong Tong,et al.  Speaker cluster based GMM tokenization for speaker recognition , 2006, INTERSPEECH.

[19]  David K. Burton,et al.  Text-dependent speaker verification using vector quantization source coding , 1985, IEEE Trans. Acoust. Speech Signal Process..

[20]  Samy Bengio,et al.  Why do multi-stream, multi-band and multi-modal approaches work on biometric user authentication tasks? , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Sridha Sridharan,et al.  A comparison of fusion techniques in mel-cepstral based speaker identification , 1998, ICSLP.

[22]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[23]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[24]  T. Kinnunen,et al.  Symmetric Distortion Measure for Speaker Recognition , 2004 .

[25]  Lukás Burget,et al.  Comparison of scoring methods used in speaker recognition with Joint Factor Analysis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  J. Harrington,et al.  Techniques in Speech Acoustics , 1999, Computational Linguistics.

[27]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[28]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[29]  Haizhou Li,et al.  A GMM-based probabilistic sequence kernel for speaker verification , 2007, INTERSPEECH.

[30]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[31]  Jason W. Pelecanos,et al.  Robust Speaker Recognition Over Varying Channels Report from JHU workshop 2008 , 2009 .

[32]  Elizabeth Shriberg,et al.  An anticorrelation kernel for improved system combination in speaker verification , 2008, Odyssey.

[33]  Mike Brookes,et al.  Voice source cepstrum coefficients for speaker identification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[35]  Kuldip K. Paliwal,et al.  Usefulness of phase spectrum in human speech perception , 2003, INTERSPEECH.

[36]  Jean Rouat,et al.  Combining pitch and MFCC for speaker identification systems , 2001, Odyssey.

[37]  Hynek Hermansky,et al.  Data-Driven Temporal Filters and Alternatives to GMM in Speaker Verification , 2000, Digit. Signal Process..

[38]  David A. van Leeuwen,et al.  NIST and NFI-TNO evaluations of automatic speaker recognition , 2006, Comput. Speech Lang..

[39]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[40]  Ramesh A. Gopinath,et al.  Short-time Gaussianization for robust speaker verification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Ke Chen,et al.  Methods of Combining Multiple Classifiers with Different Features and Their Applications to Text-Independent Speaker Identification , 1997, Int. J. Pattern Recognit. Artif. Intell..

[43]  Rong Tong,et al.  Fusion of Acoustic and Tokenization Features for Speaker Recognition , 2006, ISCSLP.

[44]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[45]  Hervé Bourlard,et al.  On the combination of speech and speaker recognition , 2003, INTERSPEECH.

[46]  Günther Palm,et al.  A discriminative training algorithm for VQ-based speaker identification , 1999, IEEE Trans. Speech Audio Process..

[47]  Larry P. Heck,et al.  Robustness to telephone handset distortion in speaker recognition by discriminative feature design , 2000, Speech Commun..

[48]  Jianwu Dang,et al.  An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification , 2008, Speech Commun..

[49]  Ibon Saratxaga,et al.  Detection of synthetic speech for the problem of imposture , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[51]  Tatsuya Kitamura Acoustic analysis of imitated voice produced by a professional impersonator , 2008, INTERSPEECH.

[52]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[53]  Tomi Kinnunen Joint Acoustic-Modulation Frequency for Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[54]  Kishore Prahallad,et al.  AANN: an alternative to GMM for pattern recognition , 2002, Neural Networks.

[55]  Tomi Kinnunen COMPARISON OF CLUSTERING ALGORITHMS IN SPEAKER IDENTIFICATION , 2000 .

[56]  Brian Kingsbury,et al.  Pseudo Pitch Synchronous Analysis of Speech With Applications to Speaker Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[57]  Bin Ma,et al.  A Generalized Feature Transformation Approach for Channel Robust Speaker Verification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[58]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[59]  Zdravko Kacic,et al.  A study of harmonic features for the speaker recognition , 1997, Speech Commun..

[60]  Tomi Kinnunen,et al.  Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification , 2009, Pattern Recognit. Lett..

[61]  Tomi Kinnunen,et al.  Spectral Features for Automatic Text-Independent Speaker Recognition , 2003 .

[62]  Joseph P. Campbell,et al.  Phonetic speaker recognition , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[63]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[64]  Delphine Charlet,et al.  Prosodic parameter for speaker identification , 2002, INTERSPEECH.

[65]  Elizabeth Shriberg,et al.  Speaker recognition using syllable-based constraints for cepstral frame selection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[66]  A. Stolcke,et al.  Combining feature sets with support vector machines: application to speaker recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[67]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[68]  Joseph P. Campbell,et al.  Gender-dependent phonetic refraction for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[69]  Larry P. Heck,et al.  A model-based transformational approach to robust speaker recognition , 2000, INTERSPEECH.

[70]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..

[71]  Tomi Kinnunen,et al.  Speaker Verification with Adaptive Spectral Subband Centroids , 2007, ICB.

[72]  Gal Ashour,et al.  Characterization of speech during imitation , 1999, EUROSPEECH.

[73]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[74]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[75]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[76]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[77]  Ran D. Zilca Text-independent speaker verification using utterance level scoring and covariance modeling , 2002, IEEE Trans. Speech Audio Process..

[78]  Kuldip K. Paliwal,et al.  USE OF VOICING AND PITCH INFORMATION FOR SPEAKER RECOGNITION , 2000 .

[79]  Tomi Kinnunen,et al.  APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA , 2005 .

[80]  Sridha Sridharan,et al.  Modelling session variability in text-independent speaker verification , 2005, INTERSPEECH.

[81]  Jérôme Louradour,et al.  Discriminative power of transient frames in speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[82]  Douglas E. Sturim,et al.  Speaker adaptive cohort selection for Tnorm in text-independent speaker verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[83]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[84]  Nengheng Zheng,et al.  Integration of Complementary Acoustic Features for Speaker Recognition , 2007, IEEE Signal Processing Letters.

[85]  John H. L. Hansen,et al.  An experimental study of speaker verification sensitivity to computer voice-altered imposters , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[86]  Rahim Saeidi,et al.  Particle Swarm Optimization for Sorted Adapted Gaussian Mixture Models , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[87]  Jean-François Bonastre,et al.  Subband architecture for automatic speaker recognition , 2000, Signal Process..

[88]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[89]  E. Ambikairajah Emerging features for speaker recognition , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.

[90]  Larry P. Heck,et al.  Combining speaker and speech recognition systems , 2002, INTERSPEECH.

[91]  Ivan Magrin-Chagnolleau,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..

[92]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[93]  E. Ambikairajah,et al.  Extraction of FM components from speech signals using all-pole model , 2008 .

[94]  Wei-Guo Gong,et al.  Pitch Synchronous Based Feature Extraction for Noise-Robust Speaker Verification , 2008, 2008 Congress on Image and Signal Processing.

[95]  Mübeccel Demirekler,et al.  Speaker identification by combining multiple classifiers using Dempster-Shafer theory of evidence , 2003, Speech Commun..

[96]  Patrick Kenny,et al.  Comparison between factor analysis and GMM support vector machines for speaker verification , 2008, Odyssey.

[97]  Carmen García-Mateo,et al.  On combining classifiers for speaker authentication , 2003, Pattern Recognit..

[98]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[99]  Moshe Koppel,et al.  Using Post-Classifiers to Enhance Fusion of Low- and High-Level Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[100]  Tomi Kinnunen,et al.  Maximum a Posteriori Adaptation of the Centroid Model for Speaker Verification , 2008, IEEE Signal Processing Letters.

[101]  Larry P. Heck,et al.  Modeling dynamic prosodic variation for speaker verification , 1998, ICSLP.

[102]  Hynek Hermansky,et al.  Speaker verification based on broad phonetic categories , 2001, Odyssey.

[103]  Jean-François Bonastre,et al.  AMIRAL: A Block-Segmental Multirecognizer Architecture for Automatic Speaker Recognition , 2000, Digit. Signal Process..

[104]  Te-Won Lee,et al.  Learning statistically efficient features for speaker recognition , 2002, Neurocomputing.

[105]  Andreas Stolcke,et al.  Speaker Recognition With Session Variability Normalization Based on MLLR Adaptation Transforms , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[106]  Haizhou Li,et al.  Temporal Discrete Cosine Transform : Towards Longer Term Temporal Features for Speaker Verification , 2006 .

[107]  Lukás Burget,et al.  Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[108]  Andreas Stolcke,et al.  Modeling prosodic feature sequences for speaker recognition , 2005, Speech Commun..

[109]  Hideki Kawahara,et al.  Comparative evaluation of F0 estimation algorithms , 2001, INTERSPEECH.

[110]  A. Gray,et al.  Long-term feature averaging for speaker recognition , 1977 .

[111]  Daniel Garcia-Romero,et al.  Robust likelihood ratio estimation in Bayesian forensic speaker recognition , 2003, INTERSPEECH.

[112]  Aladdin M. Ariyaeeinia,et al.  Score normalisation applied to open-set, text-independent speaker identification , 2003, INTERSPEECH.

[113]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[114]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[115]  Michael J. Carey,et al.  Robust prosodic features for speaker identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[116]  Krzysztof Kryszczuk,et al.  Reliability-Based Decision Fusion in Multimodal Biometric Verification Systems , 2007, EURASIP J. Adv. Signal Process..

[117]  Jason W. Pelecanos,et al.  Text-Independent Speaker Verification in Embedded Environments , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[118]  Nicholas W. D. Evans,et al.  Improving the performance of text-independent short duration SVM- and GMM-based speaker verification , 2008, Odyssey.

[119]  Haizhou Li,et al.  An SVM Kernel With GMM-Supervector Based on the Bhattacharyya Distance for Speaker Recognition , 2009, IEEE Signal Processing Letters.

[120]  P. Fränti,et al.  Fusion of Spectral Feature Sets for Accurate Speaker Identification , 2004 .

[121]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[122]  Stéphane H. Maes,et al.  Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition , 2003, IEEE Trans. Speech Audio Process..

[123]  Andrzej Drygajlo,et al.  Pitch-dependent GMMs for text-independent speaker recognition systems , 2001, INTERSPEECH.

[124]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[125]  Douglas E. Sturim,et al.  The 2004 MIT Lincoln Laboratory speaker recognition system , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[126]  Tomi Kinnunen,et al.  On the Use of Long-Term Average Spectrum in Automatic Speaker Recognition , 2006 .

[127]  William M. Campbell,et al.  Speaker recognition with polynomial classifiers , 2002, IEEE Trans. Speech Audio Process..

[128]  Christian Müller,et al.  Speaker Classification I: Fundamentals, Features, and Methods , 2007, Speaker Classification.

[129]  Francis Nolan The phonetic bases of speaker recognition : Cambridge Studies in Speech Science and Communication, Cambridge University Press, Cambridge, 1983, 221 pp. ISBN 0-521-24486-2 , 1987, Speech Commun..

[130]  Alex Park,et al.  ASR dependent techniques for speaker identification , 2002, INTERSPEECH.

[131]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[132]  Hervé Bourlard,et al.  User-customized password speaker verification using multiple reference and background models , 2006, Speech Commun..

[133]  Lukás Burget,et al.  Support vector machines and Joint Factor Analysis for speaker verification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[134]  Hirotaka Nakasone,et al.  Pitch synchronized speech processing (PSSP) for speaker recognition , 2004, Odyssey.

[135]  Sun-Yuan Kung,et al.  Stochastic Feature Transformation with Divergence-Based Out-of-Handset Rejection for Robust Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[136]  Sridha Sridharan,et al.  Vector quantization based Gaussian modeling for speaker verification , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[137]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[138]  P. Alku,et al.  A method for generating natural-sounding speech stimuli for cognitive brain research , 1999, Clinical Neurophysiology.

[139]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[140]  Martin Loomes,et al.  Sub-band based text-dependent speaker verification , 2003, Speech Commun..

[141]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[142]  Bayya Yegnanarayana,et al.  Extraction and representation of prosodic features for language and speaker recognition , 2008, Speech Commun..

[143]  Sridha Sridharan,et al.  Discriminant NAP for SVM speaker recognition , 2008, Odyssey.

[144]  Pietro Laface,et al.  Compensation of Nuisance Factors for Speaker and Language Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[145]  Robert Faltlhauser,et al.  Improving speaker recognition using phonetically structured Gaussian mixture models , 2001, INTERSPEECH.

[146]  Bayya Yegnanarayana,et al.  Prosodic features for speaker verification , 2006, INTERSPEECH.

[147]  Keiichi Tokuda,et al.  A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction , 2001, Speech Commun..

[148]  Sridha Sridharan,et al.  Explicit modelling of session variability for speaker verification , 2008, Comput. Speech Lang..

[149]  Delphine Charlet,et al.  Speaker recognition by location in the space of reference speakers , 2006, Speech Commun..

[150]  Samy Bengio,et al.  Spectral Subband Centroids as Complementary Features for Speaker Authentication , 2004, ICBA.

[151]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[152]  Andreas Stolcke,et al.  Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[153]  William M. Campbell,et al.  A new kernel for SVM MLLR based speaker recognition , 2007, INTERSPEECH.

[154]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[155]  Mohamed Chetouani,et al.  Filter Bank Design for Speaker Diarization Based on Genetic Algorithms , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[156]  Roland Auckenthaler,et al.  Gaussian selection applied to text-independent speaker verification , 2001, Odyssey.

[157]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[158]  Alvin F. Martin,et al.  NIST Speaker Recognition Evaluations Utilizing the Mixer Corpora—2004, 2005, 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[159]  Yuan-Fu Liao,et al.  Eigen-prosody analysis for robust speaker recognition under mismatch handset environment , 2004, INTERSPEECH.

[160]  Larry P. Heck,et al.  Phonetic class-based speaker verification , 2003, INTERSPEECH.

[161]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[162]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[163]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[164]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[165]  Jérôme Louradour,et al.  SVM speaker verification using a new sequence Kernel , 2005, 2005 13th European Signal Processing Conference.

[166]  Sven C. Martin,et al.  Statistical Language Modeling Using Leaving-One-Out , 1997 .

[167]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[168]  Carol Y. Espy-Wilson,et al.  A new set of features for text-independent speaker identification , 2006, INTERSPEECH.

[169]  Andreas Stolcke,et al.  Nonparametric feature normalization for SVM-based speaker verification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[170]  Bayya Yegnanarayana,et al.  Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.

[171]  Samy Bengio,et al.  A comparative study of adaptation methods for speaker verification , 2002, INTERSPEECH.

[172]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[173]  Robert I. Damper,et al.  Improving speaker identification in noise by subband processing and decision fusion , 2003, Pattern Recognit. Lett..

[174]  David A. van Leeuwen,et al.  Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[175]  Driss Matrouf,et al.  Artificial impostor voice transformation effects on false acceptance rates , 2007, INTERSPEECH.

[176]  Tomi Kinnunen,et al.  Text-independent speaker recognition using graph matching , 2008, Pattern Recognit. Lett..

[177]  Elizabeth Shriberg,et al.  System combination using auxiliary information for speaker verification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[178]  Rong Tong,et al.  Chinese Dialect Identification Using Tone Features Based on Pitch Flux , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[179]  T. Kinnunen,et al.  Long-Term F0 Modeling for Text-Independent Speaker Recognition , 2005 .

[180]  Man-Wai Mak,et al.  Adaptive articulatory feature-based conditional pronunciation modeling for speaker verification , 2004, Speech Commun..

[181]  Hynek Hermansky,et al.  Should recognizers have ears? , 1998, Speech Commun..

[182]  Richard J. Mammone,et al.  Speaker recognition - general classifier approaches and data fusion methods , 2002, Pattern Recognit..

[183]  J.H.L. Hansen,et al.  An efficient scoring algorithm for Gaussian mixture model based speaker identification , 1998, IEEE Signal Processing Letters.

[184]  William M. Campbell,et al.  Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.

[185]  Bin Ma,et al.  Joint map adaptation of feature transformation and Gaussian Mixture Model for speaker recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[186]  Christian Müller Speaker Classification II, Selected Projects , 2007, Speaker Classification.

[187]  Hilla Peretz,et al.  The , 1966 .

[188]  Eliathamby Ambikairajah,et al.  FM features for automatic forensic speaker recognition , 2008, INTERSPEECH.

[189]  Hsiao-Chuan Wang,et al.  Joint estimation of feature transformation parameters and Gaussian mixture model for speaker identification , 1999, Speech Commun..

[190]  Tanja Schultz,et al.  Speaker identification using multilingual phone strings , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[191]  Bing Xiang Text-independent speaker verification with dynamic trajectory model , 2003, IEEE Signal Processing Letters.

[192]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[193]  Man-Wai Mak,et al.  A Comparison of Various Adaptation Methods for Speaker Verification With Limited Enrollment Data , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[194]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[195]  Driss Matrouf,et al.  State-of-the-Art Performance in Text-Independent Speaker Verification Through Open-Source Software , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[196]  Tomi Kinnunen,et al.  Designing a speaker-discriminative adaptive filter bank for speaker recognition , 2002, INTERSPEECH.

[197]  Beat Pfister,et al.  Estimating the weight of evidence in forensic speaker verification , 2003, INTERSPEECH.

[198]  Gérard Chollet,et al.  Linear and non-linear fusion of ALISP-based and GMM systems for text-independent speaker verification , 2004, Odyssey.

[199]  Toby Berger,et al.  Efficient text-independent speaker verification with structural Gaussian mixture models and neural network , 2003, IEEE Trans. Speech Audio Process..

[200]  Marcos Faúndez-Zanuy,et al.  Investigation on LP-residual representations for speaker identification , 2009, Pattern Recognit..

[201]  Patrick Kenny,et al.  Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[202]  Aaron E. Rosenberg,et al.  Report: A vector quantization approach to speaker recognition , 1987, AT&T Technical Journal.

[203]  J. E. Porter,et al.  Normalizations and selection of speech segments for speaker recognition scoring , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[204]  Marie A. Roch Gaussian-selection-based non-optimal search for speaker identification , 2006, Speech Commun..

[205]  Gérard Chollet,et al.  Support Vector Gmms for Speaker Verification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[206]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[207]  Larry P. Heck,et al.  A lognormal tied mixture model of pitch for prosody based speaker recognition , 1997, EUROSPEECH.

[208]  Nikki Mirghafori,et al.  Word-conditioned HMM supervectors for speaker recognition , 2007, INTERSPEECH.

[209]  Douglas A. Reynolds,et al.  Modeling prosodic dynamics for speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[210]  Rajesh M. Hegde,et al.  Application of the modified group delay function to speaker identification and discrimination , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[211]  Rong Tong,et al.  The I4U system in NIST 2008 speaker recognition evaluation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[212]  Bayya Yegnanarayana,et al.  Speaker-specific mapping for text-independent speaker recognition , 2003, Speech Commun..

[213]  Julian Fiérrez,et al.  Speaker verification using speaker- and test-dependent fast score normalization , 2007, Pattern Recognit. Lett..

[214]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[215]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[216]  Sridha Sridharan,et al.  Data-driven clustering for blind feature mapping in speaker verification , 2005, INTERSPEECH.

[217]  F Botti,et al.  The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. , 2004, Forensic science international.

[218]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[219]  Sunil K. Gupta,et al.  Text-independent speaker verification based on broad phonetic segmentation of speech , 1992, Digit. Signal Process..

[220]  G. Ruske,et al.  Improving Speaker Recognition Performance Using Phonetically Structured Gaussian Mixture Models , 2001 .

[221]  Eric G. Hansen,et al.  Glottal modeling and closed-phase analysis for speaker recognition , 2004, Odyssey.

[222]  Mark J. F. Gales,et al.  Combining Derivative and Parametric Kernels for Speaker Verification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[223]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[224]  Julian Fiérrez,et al.  On the use of quality measures for text-independent speaker recognition , 2004, Odyssey.

[225]  Kornel Laskowski,et al.  Modeling instantaneous intonation for speaker identification using the fundamental frequency variation spectrum , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[226]  Heinz Hügli,et al.  Usefulness of the LPC-residue in text-independent speaker verification , 1995, Speech Commun..

[227]  Douglas E. Sturim,et al.  Speaker indexing in large audio databases using anchor models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[228]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[229]  RockOn Team,et al.  Re: Attenuation compensation in single-photon emission tomography: a comparative evaluation. , 1983, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.

[230]  Hideki Kawahara,et al.  Comparative evaluation of F estimation algorithms , 2001 .

[231]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[232]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[233]  Itshak Lapidot,et al.  Unsupervised speaker recognition based on competition between self-organizing maps , 2002, IEEE Trans. Neural Networks.

[234]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[235]  S. R. Mahadeva Prasanna,et al.  Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[236]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[237]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[238]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[239]  Douglas A. Reynolds,et al.  A study of computation speed-UPS of the GMM-UBM speaker recognition system , 1999, EUROSPEECH.

[240]  George R. Doddington,et al.  Speaker verification over long distance telephone lines , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[241]  Peter Regel-Brietzmann,et al.  Combination of vector quantization and gaussian mixture models for speaker verification with sparse training data , 1999, EUROSPEECH.

[242]  Pasi Fränti,et al.  Accuracy of MFCC-Based Speaker Recognition in Series 60 Device , 2005, EURASIP J. Adv. Signal Process..

[243]  Tomi Kinnunen,et al.  Real-time speaker identification and verification , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[244]  Jean-François Bonastre,et al.  Localization and selection of speaker-specific information with statistical modeling , 2000, Speech Commun..

[245]  Rong Tong,et al.  Spoken Language Recognition Using Ensemble Classifiers , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[246]  Haizhou Li,et al.  Characterizing speech utterances for speaker verification with sequence kernel SVM , 2008, INTERSPEECH.

[247]  Patrick Kenny,et al.  Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[248]  Bin Ma,et al.  Using MAP estimation of feature transformation for speaker recognition , 2008, INTERSPEECH.

[249]  Haizhou Li,et al.  Dimension reduction of the modulation spectrogram for speaker verification , 2008, Odyssey.

[250]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[251]  Levent M. Arslan,et al.  Frequency analysis of speaker identification , 2001, Odyssey.

[252]  Sun-Yuan Kung,et al.  Robust speaker verification from GSM-transcoded speech based on decision fusion and feature transformation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[253]  Frédéric Bimbot,et al.  Application of time-frequency principal component analysis to text-independent speaker identification , 2002, IEEE Trans. Speech Audio Process..