Automatic assessment of children speech to support language learning

Focus of this work are pattern recognition related aspects of computer assisted pronunciation training (CAPT) for second language learning. An overview of commercial systems shows that pronunciation training is being addressed by the growing eld of computer assisted language learning only to a small extend, although in the state-of-the-art section a number of such approaches for automatic assessment can already be presented. In the present thesis di erent approaches are extended and combined. In particular a large set of nearly 200 pronunciation and prosodic features is developed. By this approach pronunciation scoring is regarded as classi cation task in high-dimensional feature space. Automatic speech recognition is the basis of most pronunciation scoring algorithms. In this thesis a system is presented, which supports second language learning in school, i.e. the target users are children. For this reason a state-of-the-art speech recognition engine is adapted to children speech, since young speakers are only hardly recognised by automatic systems. Phonetically motivated rules for typical mispronunciation errors are integrated into the system to make it suitable for pronunciation scoring. Evaluating an algorithm for pronunciation assessment is more di cult than simply counting the correctly recognised mistakes, since there exists no objective ground truth. This can be shown by evaluating the annotations of 14 teachers. However, with di erent measures it can be veri ed that the accuracy of the system (in comparison with teachers) thoroughly reaches the agreement among teachers. The evaluation is conducted with native German speakers learning English.

[1]  James L. Hieronymus ASCII Phonetic Symbols for the World''s Languages: Worldbet , 1993 .

[2]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Elmar Nöth,et al.  Comparison and Combination of Confidence Measures , 2002, TSD.

[4]  Elmar Nöth,et al.  Boosting of Prosodic and Pronunciation Features to Detect Mispronunciations of Non-Native Children , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Volker Warnke Integrierte Segmentierung und Klassifikation von Äußerungen und Dialogakten mit heterogenen Wissensquellen , 2003 .

[6]  Elmar Nöth,et al.  Environmental Adaptation with a Small Data Set of the Target Domain , 2006, TSD.

[7]  Shrikanth S. Narayanan,et al.  Creating conversational interfaces for children , 2002, IEEE Trans. Speech Audio Process..

[8]  Elmar Nöth,et al.  The Prosody Module , 2006, SmartKom.

[9]  Helmer Strik,et al.  Automatic Speech Recognition for second language learning: How and why it actually works , 2003 .

[10]  Mitch Weintraub,et al.  Automatic evaluation and training in English pronunciation , 1990, ICSLP.

[11]  Nicole Beringer,et al.  Off-talk - a problem for human-machine-interaction? , 2001, INTERSPEECH.

[12]  Thomas Kuhn,et al.  A spoken dialogue system for German intercity train timetable inquiries , 1993, EUROSPEECH.

[13]  T Senserrick,et al.  Vowel classification in children. , 1996, The Journal of the Acoustical Society of America.

[14]  Elmar Nöth,et al.  "Of all things the measure is man" automatic classification of emotions and inter-labeler consistency [speech-based emotion recognition] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  Elmar Nöth,et al.  Automatic Assessment of Children’s Speech with Cleft Lip and Palate , .

[16]  Ernst Günter Schukat-Talamazzini,et al.  Automatische Spracherkennung - Grundlagen, statistische Modelle und effiziente Algorithmen , 1995, Künstliche Intelligenz.

[17]  Elmar Nöth,et al.  Caller: Computer Assisted Language Learning from Erlangen - Pronunciation Training and More , 2007 .

[18]  G. H. Slusser,et al.  Statistical analysis in psychology and education , 1960 .

[19]  Jean-Pierre Martens,et al.  On The Use of Phonological Features for Pronunciation Scoring , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  K. Shadan,et al.  Available online: , 2012 .

[21]  H. Wakita Normalization of vowels by vocal-tract length and its application to vowel identification , 1977 .

[22]  Andreas Stolcke,et al.  Trapping conversational speech: extending TRAP/tandem approaches to conversational telephone speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Maxine Eskenazi,et al.  The Fluency Pronunciation Trainer: Update and user issues , 2000 .

[24]  Nobuaki Minematsu,et al.  Pronunciation assessment based upon the compatibility between a learner's pronunciation structure and the target language's lexical structure , 2004, INTERSPEECH.

[25]  Stefan Rieck Parametrisierung und Klassifikation gesprochener Sprache , 1995 .

[26]  Shrikanth S. Narayanan,et al.  Robust recognition of children's speech , 2003, IEEE Trans. Speech Audio Process..

[27]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[28]  P. J. Werbos,et al.  Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[29]  Elmar Nöth,et al.  Pronunciation Feature Extraction , 2005, DAGM-Symposium.

[30]  Andreas Zell,et al.  Simulation neuronaler Netze , 1994 .

[31]  Hynek Hermansky,et al.  Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP) , 1991, EUROSPEECH.

[32]  Joakim Gustafson,et al.  Child and adult speaker adaptation during error resolution in a publicly available spoken dialogue system , 2003, INTERSPEECH.

[33]  Hermann Ney,et al.  Computing Mel-frequency cepstral coefficients on the power spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[34]  Daniel Elenius,et al.  Adaptation and normalization experiments in speech recognition for 4 to 8 year old children , 2005, INTERSPEECH.

[35]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT '99.

[36]  Diego Giuliani,et al.  Preliminary Investigations in Automatic Recognition of English Sentences Uttered by Italian Children , 2004 .

[37]  Elmar Nöth,et al.  Improving Children's Speech Recognition by HMM Interpolation with an Adults' Speech Recognizer , 2003, DAGM-Symposium.

[38]  Martin J. Russell,et al.  The STAR system: an interactive pronunciation tutor for young children , 2000, Comput. Speech Lang..

[39]  Helmer Strik,et al.  Feedback in Computer Assisted Pronunciation Training: When technology meets pedagogy , 2002 .

[40]  Stefan Steidl,et al.  Automatic classification of emotion related user states in spontaneous children's speech , 2009 .

[41]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[42]  Shrikanth S. Narayanan,et al.  Acoustics of children's speech: developmental changes of temporal and spectral parameters. , 1999, The Journal of the Acoustical Society of America.

[43]  Eric Atwell,et al.  Automatic localization and diagnosis of pronunciation errors for second-language learners of English. , 1999 .

[44]  Elmar Nöth,et al.  Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech , 2008, User Modeling and User-Adapted Interaction.

[45]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[46]  Christian Hacker,et al.  Revising Perceptual Linear Prediction (PLP) , 2005, INTERSPEECH.

[47]  Katrin Schneider,et al.  Acoustic correlates of contrastive stress in German children , 2005, INTERSPEECH.

[48]  Anton Batliner,et al.  Using Prosodic Features To Characterize Off-Talk In Human-Computer Interaction , 2001 .

[49]  Bryan L. Pellom,et al.  Data driven subword unit modeling for speech recognition and its application to interactive reading tutors , 2005, INTERSPEECH.

[50]  Horacio Franco,et al.  Automatic detection of phone-level mispronunciation for language learning , 1999, EUROSPEECH.

[51]  Shrikanth S. Narayanan,et al.  Automatic speech recognition for children , 1997, EUROSPEECH.

[52]  Shuangyu Chang,et al.  Learning discriminative temporal patterns in speech: development of novel TRAPS-like classifiers , 2003, INTERSPEECH.

[53]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[54]  Wolfgang Menzel,et al.  Phonetic Rules for Diagnosis of Pronunciation Errors , 2000, KONVENS.

[55]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[56]  Yoon Kim,et al.  Automatic pronunciation scoring of specific phone segments for language instruction , 1997, EUROSPEECH.

[57]  A. Batliner,et al.  Does multimodality really help? the classification of emotion and of On/Off-focus in multimodal dialogues - two case studies. , 2007, ELMAR 2007.

[58]  Georg Stemmer Modeling variability in speech recognition , 2004 .

[59]  H. Niemann,et al.  Multiple time resolutions for derivatives of Mel-frequency cepstral coefficients , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[60]  Hynek Hermansky,et al.  Band-independent speech-event categories for TRAP based ASR , 2003, INTERSPEECH.

[61]  Fabio Brugnara,et al.  Speaker normalization through constrained MLLR based transforms , 2004, INTERSPEECH.

[62]  Maxine Eskénazi,et al.  Detection of foreign speakers' pronunciation errors for second language training-preliminary results , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[63]  Daniel Elenius,et al.  The PF_STAR children's speech corpus , 2005, INTERSPEECH.

[64]  Björn Granström,et al.  Design strategies for a virtual language tutor , 2004, INTERSPEECH.

[65]  Shrikanth S. Narayanan,et al.  Politeness and frustration language in child-machine interactions , 2001, INTERSPEECH.

[66]  Helmer Strik,et al.  Pronunciation Evaluation in Read and Spontaneous Speech: A Comparison between human ratings and automatic scores , 2002 .

[67]  Elmar Nöth,et al.  To talk or not to talk with a computer , 2008, Journal on Multimodal User Interfaces.

[68]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[69]  Hynek Hermansky,et al.  TRAPS - classifiers of temporal patterns , 1998, ICSLP.

[70]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[71]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[72]  H. Hermansky,et al.  Analysis of Speaker and Channel Variability in , 1999 .

[73]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[74]  Wolfgang Menzel,et al.  Phonetic annotation of a non-native speech corpus , 2007 .

[75]  Emiel Krahmer,et al.  Signaling and detecting uncertainty in audiovisual speech by children and adults , 2004, INTERSPEECH.

[76]  Hermann Ney,et al.  Vocal tract normalization as linear transformation of MFCC , 2003, INTERSPEECH.

[77]  Alexandros Potamianos,et al.  On combining frequency warping and spectral shaping in HMM based speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[78]  J. Friedman Regularized Discriminant Analysis , 1989 .

[79]  Sangita R. Sharma,et al.  Multi-stream approach to robust speech recognition , 1999 .

[80]  Shrikanth S. Narayanan,et al.  Detecting Politeness and frustration state of a child in a conversational computer game , 2005, INTERSPEECH.

[81]  Björn Granström,et al.  Speech technology for language training and e-inclusion , 2005, INTERSPEECH.

[82]  Joakim Gustafson,et al.  Voice transformations for improving children²s speech recognition in a publicly available dialogue system , 2002, INTERSPEECH.

[83]  John Laver,et al.  Training vowel pronunciation using a computer-aided teaching system , 1993, EUROSPEECH.

[84]  Elmar Nöth,et al.  M = Syntax + Prosody: A syntactic-prosodic labelling scheme for large spontaneous speech databases , 1998, Speech Commun..

[85]  Nelson Morgan,et al.  Learning long-term temporal features in LVCSR using neural networks , 2004, INTERSPEECH.

[86]  Jay G. Wilpon,et al.  A study of speech recognition for children and the elderly , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[87]  B. Everitt,et al.  Large sample standard errors of kappa and weighted kappa. , 1969 .

[88]  Helmer Strik,et al.  Feedback in computer assisted pronunciation training: technology push or demand pull? , 2002, INTERSPEECH.

[89]  Maxine Eskenazi,et al.  Using a Computer in Foreign Language Pronunciation Training: What Advantages? , 1999 .

[90]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[91]  Diego Giuliani,et al.  Investigating recognition of children's speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[93]  Satoshi Nakamura,et al.  Automatic pronunciation scoring of words and sentences independent from the non-native's first language , 2009, Comput. Speech Lang..

[94]  Harvey b. Fletcher,et al.  Speech and hearing in communication , 1953 .

[95]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[96]  Mark A. Fanty,et al.  Rapid unsupervised adaptation to children's speech on a connected-digit task , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[97]  Philip C. Woodland Speaker adaptation for continuous density HMMs: a review , 2001 .

[98]  Björn Granström,et al.  Phonetic-level mispronunciation detection in non-native Swedish speech , 1998, ICSLP.

[99]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[100]  O. Jokisch,et al.  OpenVOC-Open Platform for Multilingual Vocabulary Training Integrating Speech Technology Components , 2005 .

[101]  Seiichi Nakagawa,et al.  A statistical method of evaluating pronunciation proficiency for Japanese words , 2005, INTERSPEECH.

[102]  Elmar Nöth,et al.  From Emotion to Interaction: Lessons from Real Human-Machine-Dialogues , 2004, ADS.

[103]  Silke M. Witt,et al.  Use of speech recognition in computer-assisted language learning , 2000 .

[104]  J. Fleiss,et al.  Measuring Agreement for Multinomial Data , 1982 .

[105]  D V Cicchetti,et al.  Assessing Inter-Rater Reliability for Rating Scales: Resolving some Basic Issues , 1976, British Journal of Psychiatry.

[106]  Nobuaki Minematsu Pronunciation assessment based upon the phonological distortions observed in language learners' utterances , 2004, INTERSPEECH.

[107]  Rodolfo Delmonte,et al.  SLIM prosodic automatic tools for self-learning instruction , 2000, Speech Commun..

[108]  Alex Acero,et al.  Training wideband acoustic models using mixed-bandwidth training data via feature bandwidth extension , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[109]  Diego Giuliani,et al.  Investigating automatic recognition of non-native children's speech , 2004, INTERSPEECH.

[110]  Elmar Nöth,et al.  Taking into account the user2s focus of attention with the help of audio-visual information: towards less artificial human-machine-communication , 2007, AVSP.

[111]  Katsuhiko Shirai,et al.  Analysis of the phone level contributions to objective evaluation of English speech by non-natives , 2004, INTERSPEECH.

[112]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[113]  Tatsuya Kawahara,et al.  Modelling of the perception of English sentence stress for computer-assisted language learning , 2000, INTERSPEECH.

[114]  Elmar Nöth,et al.  Adaptation in the pronunciation space for non-native speech recognition , 2004, INTERSPEECH.

[115]  Edwin D. Mares,et al.  On S , 1994, Stud Logica.

[116]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[117]  Mitch Weintraub,et al.  Automatic text-independent pronunciation scoring of foreign language student speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[118]  Mitch Weintraub,et al.  Automatic scoring of pronunciation quality , 2000, Speech Commun..

[119]  Elmar Nöth,et al.  Assessment of Non-Native Children ’ s Pronunciation : Human Marking and Automatic Scoring , .

[120]  Jonathan G. Fiscus,et al.  Tools for the analysis of benchmark speech recognition tests , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[121]  Claudio Zmarich,et al.  Phonetic inventories in Italian children aged 18-27 months: a longitudinal study , 2005, INTERSPEECH.

[122]  Horacio Franco,et al.  WebGrader TM : A Multilingual Pronunciation Practice Tool , 1998 .

[123]  S. Narayanan,et al.  Hidden-articulator Markov models for pronunciation evaluation , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[124]  Keikichi Hirose,et al.  Structural representation of the non-native pronunciations , 2005, INTERSPEECH.

[125]  Michael Picheny,et al.  Improvements in children's speech recognition performance , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[126]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[127]  Vassilios Digalakis,et al.  Automatic pronunciation evaluation of foreign speakers using unknown text , 2007, Comput. Speech Lang..

[128]  Elmar Nöth,et al.  On the use of prosody in automatic dialogue understanding , 2002, Speech Commun..

[129]  M. Miller,et al.  Measurement and Assessment in Teaching , 1994 .

[130]  Björn W. Schuller,et al.  Towards More Reality in the Recognition of Emotional Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[131]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[132]  Kristin Precoda,et al.  Prosodic features for automatic text-independent evaluation of degree of nativeness for language learners , 2000, INTERSPEECH.

[133]  Hynek Hermansky,et al.  Beyond a single critical-band in TRAP based ASR , 2003, INTERSPEECH.

[134]  Elmar Nöth,et al.  “You Stupid Tin Box” - Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus , 2004, LREC.

[135]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[136]  R. Kronmal,et al.  Discriminant functions when covariances are unequal and sample sizes are moderate , 1977 .

[138]  Elmar Nöth,et al.  Boiling down prosody for the classification of boundaries and accents in German and English , 2001, INTERSPEECH.

[139]  Wolfgang Menzel,et al.  Automatic detection and correction of non-native English pronunciations , 2000 .

[140]  B. Efron Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods , 1981 .

[141]  Qun Li,et al.  An analysis of the causes of increased error rates in children²s speech recognition , 2002, INTERSPEECH.

[142]  Satanjeev Banerjee,et al.  Evaluating the effect of predicting oral reading miscues , 2003, INTERSPEECH.

[143]  G. Stemmer,et al.  Various Information Sources for HMM with Weighted Multiple Codebooks , 2003 .

[144]  Vassilios Digalakis,et al.  Combination of machine scores for automatic grading of pronunciation quality , 2000, Speech Commun..

[145]  Elmar Nöth,et al.  Can you understand him? Let's look at his word accuracy-automatic evaluation of tracheoesophageal speech , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[146]  Steve J. Young,et al.  Language learning based on non-native speech recognition , 1997, EUROSPEECH.

[147]  Piero Cosi,et al.  Italian children's speech recognition for advanced interactive literacy tutors , 2005, INTERSPEECH.

[148]  K. Johnson,et al.  Formants of children, women, and men: the effects of vocal intensity variation. , 1999, The Journal of the Acoustical Society of America.

[149]  Qin Li,et al.  Homomorphic modulation spectra , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[150]  Helmer Strik,et al.  The Pedagogy-Technology Interface in Computer Assisted Pronunciation Training , 2002 .

[151]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[152]  Heinrich Niemann Pattern Analysis and Understanding , 1990 .

[153]  Li Deng,et al.  Efficient and Robust Language Modeling in an Automatic Children's Reading Tutor System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[154]  Hynek Hermansky,et al.  Temporal patterns (TRAPs) in ASR of noisy speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[155]  Tino Haderlein,et al.  Automatic evaluation of tracheoesophageal substitute voices , 2007 .

[156]  Elmar Nöth,et al.  Intelligibility of Children with Cleft Lip and Palate: Evaluation by Speech Recognition Techniques , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[157]  Elmar Nöth,et al.  Are You Looking at Me, Are You Talking with Me: Multimodal Classification of the Focus of Attention , 2006, TSD.

[158]  Elmar Nöth,et al.  Acoustic normalization of children's speech , 2003, INTERSPEECH.

[159]  R S McGowan,et al.  Differences in fricative production between children and adults: evidence from an acoustic analysis of /sh/ and /s/. , 1988, The Journal of the Acoustical Society of America.

[160]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[161]  Wolfgang Wahlster,et al.  SmartWeb: Mobile Applications of the Semantic Web , 2004, GI Jahrestagung.

[162]  Martin J. Russell,et al.  Why is automatic recognition of children's speech difficult? , 2001, INTERSPEECH.

[163]  Jonathan G. Fiscus,et al.  REDUCED WORD ERROR RATES , 1997 .

[164]  Elmar Nöth,et al.  A phone recognizer helps to recognize words better , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[165]  Mark J. F. Gales Adaptive training for robust ASR , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[166]  Elmar Nöth,et al.  Tales of tuning - prototyping for automatic classification of emotional user states , 2005, INTERSPEECH.

[167]  Forbes Ave. Pittsburgh,et al.  PINPOINTING PRONUNCIATION ERRORS IN CHILDREN ’ S SPEECH : EXAMINING THE ROLE OF THE SPEECH RECOGNIZER , 2000 .

[168]  Shrikanth S. Narayanan,et al.  Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[169]  A. Feinstein,et al.  High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.

[170]  Diane J. Litman,et al.  Correlating student acoustic-prosodic profiles with student learning in spoken tutoring dialogues , 2005, INTERSPEECH.

[171]  Lou Boves,et al.  Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms , 2000, Speech Commun..

[172]  Hong Kook Kim,et al.  Acoustic Model Adaptation Based on Pronunciation Variability Analysis for Non-Native Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[173]  Elmar Nöth,et al.  Automatic annotation and classification of phrase accents in spontaneous speech , 1999, EUROSPEECH.

[174]  Mutual Knowledge and Prosody in Young Children , 2002 .

[175]  Martin J. Russell,et al.  Recognition of read and spontaneous children's speech using two new corpora , 2004, INTERSPEECH.

[176]  Yoichi Yamashita,et al.  Automatic Scoring for Prosodic Proficiency of English Sentences Spoken by Japanese Based on Utterance Comparison , 2005, IEICE Trans. Inf. Syst..

[177]  Ana Dembitz,et al.  Speech of children with cleft palate , 2010 .

[178]  Jont B. Allen How do humans process and recognize speech , 1993 .