Acoustic approaches to gender and accent identification

There has been considerable research on the problems of speaker and language recognition from samples of speech. A less researched problem is that of accent recognition. Although this is a similar problem to language identification, di�erent accents of a language exhibit more fine-grained di�erences between classes than languages. This presents a tougher problem for traditional classification techniques. In this thesis, we propose and evaluate a number of techniques for gender and accent classification. These techniques are novel modifications and extensions to state of the art algorithms, and they result in enhanced performance on gender and accent recognition. The first part of the thesis focuses on the problem of gender identification, and presents a technique that gives improved performance in situations where training and test conditions are mismatched. The bulk of this thesis is concerned with the application of the i-Vector technique to accent identification, which is the most successful approach to acoustic classification to have emerged in recent years. We show that it is possible to achieve high accuracy accent identification without reliance on transcriptions and without utilising phoneme recognition algorithms. The thesis describes various stages in the development of i-Vector based accent classification that improve the standard approaches usually applied for speaker or language identification, which are insu�cient. We demonstrate that very good accent identification performance is possible with acoustic methods by considering di�erent i-Vector projections, frontend parameters, i-Vector configuration parameters, and an optimised fusion of the resulting i-Vector classifiers we can obtain from the same data. We claim to have achieved the best accent identification performance on the test corpus for acoustic methods, with up to 90% identification rate. This performance is even better than previously reported acoustic-phonotactic based systems on the same corpus, and is very close to performance obtained via transcription based accent identification. Finally, we demonstrate that the utilization of our techniques for speech recognition purposes leads to considerably lower word error rates. Keywords: Accent Identification, Gender Identification, Speaker Identification, Gaussian Mixture Model, Support Vector Machine, i-Vector, Factor Analysis, Feature Extraction, British English, Prosody, Speech Recognition.

[1]  Silke Goronzy,et al.  Robust Adaptation to Non-Native Accents in Automatic Speech Recognition , 2002, Lecture Notes in Computer Science.

[2]  Julian Fiérrez,et al.  Speaker verification using speaker- and test-dependent fast score normalization , 2007, Pattern Recognit. Lett..

[3]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[4]  Stephen J. Cox,et al.  An Accurate and Robust Gender Identification Algorithm , 2011, INTERSPEECH.

[5]  Lukás Burget,et al.  Discriminative training and channel compensation for acoustic language recognition , 2008, INTERSPEECH.

[6]  Thomas Niesler,et al.  Multi-accent acoustic modelling of South African English , 2012, Speech Commun..

[7]  Pasi Fränti,et al.  Randomised Local Search Algorithm for the Clustering Problem , 2000, Pattern Analysis & Applications.

[8]  Jean-Claude Junqua,et al.  Separating speaker and environment variabilities for improved recognition in non-stationary conditions , 2001, INTERSPEECH.

[9]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[10]  F. Ramus,et al.  Correlates of linguistic rhythm in the speech signal , 1999, Cognition.

[11]  Elizabeth Shriberg,et al.  Parameterization of Prosodic Feature Distributions for SVM Modeling in Speaker Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[13]  Jia Liu,et al.  Parallel absolute-relative feature based phonotactic language recognition , 2013, INTERSPEECH.

[14]  Saeid Safavi,et al.  Contrasting the Effects of Different Frequency Bands on Speaker and Accent Identification , 2012, IEEE Signal Processing Letters.

[15]  Stephen J. Cox,et al.  Native accent classification via i-vectors and speaker compensation fusion , 2013, INTERSPEECH.

[16]  Sridha Sridharan,et al.  Vector quantization based Gaussian modeling for speaker verification , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[17]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[18]  Bin Ma,et al.  A Phonotactic Language Model for Spoken Language Identification , 2005, ACL.

[19]  Richard Rose,et al.  Robust speaker identification in noisy environments using noise adaptive speaker models , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[20]  John H. L. Hansen,et al.  Dialect/Accent Classification Using Unrestricted Audio , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[22]  J. Foote,et al.  WSJCAM0: A BRITISH ENGLISH SPEECH CORPUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION , 1995 .

[23]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[24]  Robert W. Ramirez,et al.  The Fft, Fundamentals and Concepts , 1984 .

[25]  J. Mercer Functions of positive and negative type, and their connection with the theory of integral equations , 1909 .

[26]  Pietro Laface,et al.  Acoustic language identification using fast discriminative training , 2007, INTERSPEECH.

[27]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[28]  Martin J. Russell,et al.  Improved language recognition using mixture components statistics , 2010, INTERSPEECH.

[29]  Patrick Kenny,et al.  Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[31]  Martin J. Russell,et al.  Human and computer recognition of regional accents and ethnic groups from British English speech , 2013, Comput. Speech Lang..

[32]  Michiel Bacchiani,et al.  Rapid adaptation for mobile speech applications , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[34]  Bayya Yegnanarayana,et al.  Extraction and representation of prosodic features for language and speaker recognition , 2008, Speech Commun..

[35]  William M. Campbell,et al.  Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.

[36]  Lynne E. Bernstein,et al.  For speech perception by humans or machines, three senses are better than one , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[37]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[38]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[39]  Yi Su,et al.  Accent detection and speech recognition for Shanghai-accented Mandarin , 2005, INTERSPEECH.

[40]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[41]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[42]  Michael J. Carey,et al.  Language independent gender identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[43]  Hermann Ney,et al.  Computing Mel-frequency cepstral coefficients on the power spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[44]  K. Munhall,et al.  Coarticulation: Theory, Data, and Techniques , 2001 .

[45]  John H. L. Hansen,et al.  The Effect of Listener Accent Background on Accent Perception and Comprehension , 2006, EURASIP J. Audio Speech Music. Process..

[46]  Marc A. Zissman,et al.  Automatic language identification using Gaussian mixture and hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[48]  Robert I. Damper,et al.  Improving speaker identification in noise by subband processing and decision fusion , 2003, Pattern Recognit. Lett..

[49]  Lukás Burget,et al.  Discriminative Training Techniques for Acoustic Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[50]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[51]  Peter Ford Dominey,et al.  Neural network processing of natural language: I. Sensitivity to serial, temporal and abstract structure of language in the infant , 2000 .

[52]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[53]  Stephen J. Cox,et al.  Iterative classification of regional British accents in i-vector space , 2012, MLSLP.

[54]  Sebastian Stüker,et al.  A hybrid phonotactic language identification system with an SVM back-end for simultaneous lecture translation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[55]  David Gerhard,et al.  Pitch Extraction and Fundamental Frequency: History and Current Techniques , 2003 .

[56]  Shrikanth Narayanan,et al.  Investigation of the inter‐articulator correlation in acoustic‐to‐articulatory inversion using generalized smoothness criterion. , 2010 .

[57]  Lukás Burget,et al.  Recent progress in prosodic speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[58]  Shai Fine,et al.  A hybrid GMM/SVM approach to speaker identification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[59]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[60]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[61]  Elizabeth A. Strand,et al.  Auditory–visual integration of talker gender in vowel perception , 1999 .

[62]  Richard B Ivry,et al.  A neural instantiation of the motor theory of speech perception Comment from Richard B. Ivry and Timothy C. Justus to Nicolson et al. , 2001, Trends in Neurosciences.

[63]  Steve Renals,et al.  SVMSVM: support vector machine speaker verification methodology , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[64]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[65]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[66]  F. Ramus,et al.  Language discrimination by human newborns and by cotton-top tamarin monkeys. , 2000, Science.

[67]  Donald G. MacKay,et al.  Relations between Word Perception and Production - New Theory and Data on the Verbal Transformation Effect , 1993 .

[68]  Tomi Kinnunen COMPARISON OF CLUSTERING ALGORITHMS IN SPEAKER IDENTIFICATION , 2000 .

[69]  Philip C. Woodland,et al.  Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition , 1997, EUROSPEECH.

[70]  John H. L. Hansen,et al.  Advances in phone-based modeling for automatic accent classification , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[71]  M. A. Kohler,et al.  Language identification using shifted delta cepstra , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..

[72]  Zhen-Yang Wu,et al.  Robust GMM Based Gender Classification using Pitch and RASTA-PLP Parameters of Speech , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[73]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[74]  J. W. Fussell Automatic sex identification from short segments of speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[75]  Najim Dehak,et al.  Discriminative and generative approaches for long- and short-term speaker characteristics modeling: application to speaker verification , 2009 .

[76]  D. Childers,et al.  Gender recognition from speech. Part I: Coarse analysis. , 1991, The Journal of the Acoustical Society of America.

[77]  Peter Trudgill,et al.  The dialects of England , 1990 .

[78]  Keiichi Tokuda,et al.  Acoustic-to-articulatory inversion mapping with Gaussian mixture model , 2004, INTERSPEECH.

[79]  Mary E. Beckman,et al.  Intonation across Spanish, in the Tones and Break Indices framework , 2002 .

[80]  Hsiao-Chuan Wang,et al.  Joint estimation of feature transformation parameters and Gaussian mixture model for speaker identification , 1999, Speech Commun..

[81]  Olivier Siohan,et al.  Ivector-based Acoustic Data Selection , 2013, INTERSPEECH.

[82]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[83]  Eliathamby Ambikairajah,et al.  Robust language identification based on fused phonotactic information with MLKSFM pre-classifier , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[84]  Milan Sigmund Voice recognition by computer , 2003 .

[85]  Jean-Luc Gauvain,et al.  Phonotactic Language Recognition Using MLP Features , 2012, INTERSPEECH.

[86]  Liming Chen,et al.  Gender identification using a general audio classifier , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[87]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[88]  William M. Campbell,et al.  Advanced Language Recognition using Cepstra and Phonotactics: MITLL System Performance on the NIST 2005 Language Recognition Evaluation , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[89]  John C. Wells,et al.  Accents of English , 1982 .

[90]  D.P. Skinner,et al.  The cepstrum: A guide to processing , 1977, Proceedings of the IEEE.

[91]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[92]  T. Kinnunen,et al.  Symmetric Distortion Measure for Speaker Recognition , 2004 .

[93]  James E. Baker,et al.  Reducing Bias and Inefficienry in the Selection Algorithm , 1987, ICGA.

[94]  Dennis H. Klatt,et al.  A digital filter bank for spectral matching , 1976, ICASSP.

[95]  Hsin-Min Wang,et al.  Discriminative Feedback Adaptation for GMM-UBM Speaker Verification , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.

[96]  Mathew Magimai.-Doss,et al.  Analysis of F0 and Cepstral Features for Robust Automatic Gender Recognition , 2009 .

[97]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[98]  Jérôme Farinas,et al.  Can Automatically Extracted Rhythmic Units Discriminate among Languages , 2002 .

[99]  Mark J. F. Gales,et al.  An improved approach to the hidden Markov model decomposition of speech and noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[100]  Mark Huckvale,et al.  Pronunciation variation modelling using accent features , 2005, INTERSPEECH.

[101]  Atsushi Nakamura,et al.  Production-Oriented Models for Speech Recognition , 2006, IEICE Trans. Inf. Syst..

[102]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[103]  Yonghong Yan,et al.  Using SVM as Back-End Classifier for Language Identification , 2008, EURASIP J. Audio Speech Music. Process..

[104]  Jeffery A. Jones,et al.  Phonetic perceptual identification by native- and second-language speakers differentially activates brain regions involved with acoustic phonetic processing and those involved with articulatory–auditory/orosensory internal models , 2004, NeuroImage.

[105]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[106]  Karin Humphreys,et al.  The psychology of language : from data to theory , 2001 .

[107]  W.J.J. Roberts,et al.  Automatic speaker recognition using Gaussian mixture models , 1999, 1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No.99EX251).

[108]  Alex Waibel,et al.  Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition , 1997 .

[109]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[110]  Martin J. Russell,et al.  Experiments with the ABI (accents of the british isles) speech corpus , 2008, INTERSPEECH.

[111]  J. Friedman Regularized Discriminant Analysis , 1989 .

[112]  Tomi Kinnunen,et al.  Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification , 2009, Pattern Recognit. Lett..

[113]  Tomi Kinnunen,et al.  Spectral Features for Automatic Text-Independent Speaker Recognition , 2003 .

[114]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[115]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[116]  William Equitz,et al.  A new vector quantization clustering algorithm , 1989, IEEE Trans. Acoust. Speech Signal Process..

[117]  P. Lobacz Processing and decoding the signal in speech perception , 1981 .

[118]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[119]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[120]  Nizar Habash,et al.  Spoken Arabic Dialect Identification Using Phonotactic Modeling , 2009, SEMITIC@EACL.

[121]  Sridha Sridharan,et al.  Improving short utterance based i-vector speaker recognition using source and utterance-duration normalization techniques , 2013, INTERSPEECH.

[122]  Mark Huckvale ACCDIST: An Accent Similarity Metric for Accent Recognition and Diagnosis , 2007, Speaker Classification.

[123]  Lukás Burget,et al.  Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[124]  Zekeriya Tufekci,et al.  Subband feature extraction using lapped orthogonal transform for speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[125]  Patrick Kenny,et al.  Disentangling speaker and channel effects in speaker verification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[126]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[127]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[128]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[129]  David R. Miller,et al.  Statistical dialect classification based on mean phonetic features , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[130]  Julia Hirschberg,et al.  Dialect recognition using a phone-GMM-supervector-based SVM kernel , 2010, INTERSPEECH.

[131]  Peter Regel-Brietzmann,et al.  Combination of vector quantization and gaussian mixture models for speaker verification with sparse training data , 1999, EUROSPEECH.

[132]  Hsuan-Tien Lin A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods , 2005 .

[133]  Gérard Chollet,et al.  Support Vector Gmms for Speaker Verification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[134]  Douglas A. Reynolds,et al.  Text independent speaker identification using automatic acoustic segmentation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[135]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[136]  Stephen Cox,et al.  A comparison of two unsupervised approaches to accent identification , 1998, ICSLP.

[137]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[138]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[139]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[140]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[141]  J. Buitelaar,et al.  Gender in Voice Perception in Autism , 2008, Journal of autism and developmental disorders.

[142]  Sridha Sridharan,et al.  Modelling session variability in text-independent speaker verification , 2005, INTERSPEECH.

[143]  Douglas E. Sturim,et al.  Speaker adaptive cohort selection for Tnorm in text-independent speaker verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[144]  Leon Cohen,et al.  Fitting the Mel scale , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[145]  John J. Ohala,et al.  Prosody as a distinctive feature for the discrimination of arabic dialects , 1999, EUROSPEECH.

[146]  R. Diehl,et al.  Speech Perception , 2004, Annual review of psychology.

[147]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[148]  Bhiksha Raj,et al.  Continuous Feature Adaptation for Non-Native Speech Recognition , 2007 .

[149]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[150]  Kerry P. Green Studies of the McGurk effect: implications for theories of speech perception , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[151]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[152]  F. Ramus,et al.  Language identification with suprasegmental cues: a study based on speech resynthesis. , 1999, The Journal of the Acoustical Society of America.

[153]  Marc A. Zissman,et al.  Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[154]  Larry P. Heck,et al.  Handset-dependent background models for robust text-independent speaker recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[155]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[156]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[157]  M. Do Fast approximation of Kullback-Leibler distance for dependence trees and hidden Markov models , 2003, IEEE Signal Processing Letters.

[158]  T.F. Quatieri,et al.  Speaker recognition from coded speech and the effects of score normalization , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[159]  Douglas D. O'Shaughnessy,et al.  Robust gender-dependent acoustic-phonetic modelling in continuous speech recognition based on a new automatic male/female classification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[160]  Sridha Sridharan,et al.  Automatic gender identification optimised for language independence , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[161]  Patrick Kenny,et al.  Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[162]  Man-Wai Mak,et al.  Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[163]  J. Flanagan,et al.  Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .

[164]  Pawan Kumar,et al.  Gender classification using pitch and formants , 2011, ICCCS '11.

[165]  John G. Proakis,et al.  Digital Communications , 1983 .

[166]  Andreas Stolcke,et al.  Speaker Recognition With Session Variability Normalization Based on MLLR Adaptation Transforms , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[167]  Stephen J. Cox,et al.  Unsupervised model selection for recognition of regional accented speech , 2014, INTERSPEECH.

[168]  Nasser M. Nasrabadi,et al.  Vector quantization of images based upon the Kohonen self-organizing feature maps , 1988, IEEE 1988 International Conference on Neural Networks.

[169]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[170]  A. Liberman,et al.  Parametrically Dissociating Speech and Nonspeech Perception in the Brain Using fMRI , 2001, Brain and Language.

[171]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[172]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[173]  Keith A. Johnson Speech Physiology, Speech Perception, and Acoustic Phonetics , 1992 .

[174]  Shrikanth S. Narayanan,et al.  A subject-independent acoustic-to-articulatory inversion , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[175]  Jean-Luc Gauvain,et al.  Language recognition using phone latices , 2004, INTERSPEECH.

[176]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[177]  Olli Nevalainen,et al.  On the splitting method for VQ codebook generation , 2007 .

[178]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[179]  Cheung-Chi Leung,et al.  Comparison of Speaker Adaptation Methods as Feature Extraction for SVM-Based Speaker Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[180]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[181]  Dat Tran,et al.  Automatic gender recognition , 2003 .

[182]  Jiawei Han,et al.  Semi-supervised Discriminant Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[183]  Hsin-Min Wang,et al.  Subspace-based phonotactic language recognition using multivariate dynamic linear models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[184]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[185]  Douglas D. O'Shaughnessy,et al.  Speech communications - human and machine, 2nd Edition , 2000 .

[186]  Bingxi Wang,et al.  Automatic Language Identification using Support Vector Machines , 2006, 2006 8th international Conference on Signal Processing.

[187]  Pietro Laface,et al.  Channel Factors Compensation in Model and Feature Domain for Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[188]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[189]  Alexander L. Francis,et al.  Paying attention to speaking rate , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.