Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates

[1]  Austin F. Frank,et al.  Analyzing linguistic data: a practical introduction to statistics using R , 2010 .

[2]  Daniel Jurafsky,et al.  Which Words Are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors that Increase ASR Error Rates , 2008, ACL.

[3]  Sadaoki Furui,et al.  Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance , 2008, Comput. Speech Lang..

[4]  Andreas Stolcke,et al.  Recent innovations in speech-to-text transcription at SRI-ICSI-UW , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  P. Boersma Praat : doing phonetics by computer (version 4.4.24) , 2006 .

[6]  Mark J. F. Gales,et al.  Automatic transcription of conversational telephone speech , 2005, IEEE Transactions on Speech and Audio Processing.

[7]  Alicia B. Wassink,et al.  Pacific northwest vowels: A Seattle neighborhood dialect study , 2005 .

[8]  Mark J. F. Gales,et al.  Training LVCSR systems on thousands of hours of data , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Lori Lamel,et al.  Do speech recognizers prefer female speakers? , 2005, INTERSPEECH.

[10]  E. Ziegel Permutation, Parametric, and Bootstrap Tests of Hypotheses (3rd ed.) , 2005 .

[11]  Xunying Liu,et al.  Development of the 2004 CU-HTK English CTS systems using more than two thousand hours of data , 2004 .

[12]  Julia Hirschberg,et al.  Prosodic and other cues to speech recognition failures , 2004, Speech Commun..

[13]  Mark J. F. Gales,et al.  Development of the 2003 CU-HTK conversational telephone speech transcription system , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Rosalind Temple,et al.  Phonetic Interpretation Papers in Laboratory Phonology VI: Acknowledgements , 2004 .

[15]  Andreas Stolcke,et al.  Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures , 2003, NAACL.

[16]  Andreas Stolcke,et al.  Prosodic knowledge sources for automatic speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  Dan Jurafsky,et al.  Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. , 2003, The Journal of the Acoustical Society of America.

[18]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[19]  Taehong Cho,et al.  Domain-initial articulatory strengthening in four languages , 2003 .

[20]  Mary P. Harper,et al.  The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources , 2002, EMNLP.

[21]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[23]  Sadaoki Furui,et al.  Error analysis using decision trees in spontaneous presentation speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[24]  M. Tanenhaus,et al.  Time Course of Frequency Effects in Spoken-Word Recognition: Evidence from Eye Movements , 2001, Cognitive Psychology.

[25]  S. Goldinger,et al.  Phonetic priming, neighborhood activation, and PARSYN , 2000, Perception & psychophysics.

[26]  Eric Fosler-Lussier,et al.  Effects of speaking rate and word frequency on pronunciations in convertional speech , 1999, Speech Commun..

[27]  P. Luce,et al.  Probabilistic Phonotactics and Neighborhood Activation in Spoken Word Recognition , 1999 .

[28]  Sumio Ohno,et al.  On the effects of speech rate upon parameters of the command-response model for the fundamental frequency contours of speech , 1998, ICSLP.

[29]  D. Pisoni,et al.  Recognizing Spoken Words: The Neighborhood Activation Model , 1998, Ear and hearing.

[30]  P. Keating,et al.  Articulatory strengthening at edges of prosodic domains. , 1997, The Journal of the Acoustical Society of America.

[31]  David B. Pisoni,et al.  Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics , 1996, Speech Commun..

[32]  R. P. Fahey,et al.  On explaining certain male-female differences in the phonetic realization of vowel categories , 1996 .

[33]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[34]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[35]  Richard M. Stern,et al.  On the effects of speech rate in large vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[36]  E. Shriberg,et al.  Acoustic properties of disfluent repetitions , 1995 .

[37]  A. Syrdal,et al.  Applied speech technology , 1995 .

[38]  David B. Pisoni,et al.  Automatic measurement of speech recognition performance: a comparison of six speaker-dependent recognition devices☆ , 1987 .

[39]  W. Marslen-Wilson Functional parallelism in spoken word-recognition , 1987, Cognition.

[40]  G. R. Doddington,et al.  Computers: Speech recognition: Turning theory to practice: New ICs have brought the requisite computer power to speech technology; an evaluation of equipment shows where it stands today , 1981, IEEE Spectrum.

[41]  A. E. Hieke A Content-Processing View of Hesitation Phenomena , 1981 .

[42]  D. Howes On the interpretation of word frequency as a variable affecting speed of recognition. , 1954, Journal of experimental psychology.