Speech Processing

Analytical background and techniques: discrete-time signals, systems and transforms analysis of discrete-time speech signals probability and random processes linear model and dynamic system model optimization methods and estimation theory statistical pattern recognition. Fundamentals of speech science: phonetic process phonological process. Computational phonology and phonetics: computational phonology computational models for speech production computational models for auditory speechprocessing. Speech technology in selected areas: speech recognition speech enhancement speech synthesis.

[1]  Keikichi Hirose,et al.  A minimax search algorithm for CDHMM based robust continuous speech recognition , 1998, ICSLP.

[2]  M. Liberman The cochlear frequency map for the cat: labeling auditory-nerve fibers of known characteristic frequency. , 1982, The Journal of the Acoustical Society of America.

[3]  L. Joseph,et al.  Bayesian Statistics: An Introduction , 1989 .

[4]  B.-H. Juang,et al.  On the hidden Markov model and dynamic time warping for speech recognition — A unified view , 1984, AT&T Bell Laboratories Technical Journal.

[5]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[6]  D. Ostry,et al.  The equilibrium point hypothesis and its application to speech motor control. , 1996, Journal of speech and hearing research.

[7]  Don McAllaster,et al.  Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch , 1998, ICSLP.

[8]  G. P. Moore,et al.  Neuronal spike trains and stochastic point processes. II. Simultaneous spike trains. , 1967, Biophysical journal.

[9]  R S McGowan,et al.  Task dynamic and articulatory recovery of lip and velar approximations under model mismatch conditions. , 1996, The Journal of the Acoustical Society of America.

[10]  A. Nuttall Some windows with very good sidelobe behavior , 1981 .

[11]  K. Stevens Airflow and Turbulence Noise for Fricative and Stop Consonants: Static Considerations , 1971 .

[12]  Ernst Terhardt,et al.  Facts and Models in Hearing , 1974 .

[13]  Michael W. Macon,et al.  Control of spectral dynamics in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[14]  Steven Greenberg,et al.  Auditory Processing of Speech , 2006 .

[15]  Katsuhiko Shirai,et al.  ARTICULATORY MODEL AND THE ESTIMATION OF ARTICULATORY PARAMETERS BY NONLINEAR REGRESSION METHOD. , 1976 .

[16]  R. Ohba,et al.  Pole-zero analysis of voiced speech using group delay characteristics , 1984 .

[17]  Chang‐Jin Kim,et al.  Dynamic linear models with Markov-switching , 1994 .

[18]  J. Kelso,et al.  Functionally specific articulatory cooperation following jaw perturbations during speech: evidence for coordinative structures. , 1984, Journal of experimental psychology. Human perception and performance.

[19]  Dexter R. F. Irvine,et al.  Auditory Brainstem Processing: Integration and Conclusions , 1986 .

[20]  C Giguère,et al.  A computational model of the auditory periphery for speech and hearing research. I. Ascending path. , 1994, The Journal of the Acoustical Society of America.

[21]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[22]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[23]  Sanjit K. Mitra,et al.  Digital Signal Processing: A Computer-Based Approach , 1997 .

[24]  Naftali Z. Tisby On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[25]  Raimond L. Winslow,et al.  Some Aspects of Rate Coding in the Auditory Nerve , 1986 .

[26]  Douglas D. O'Shaughnessy Speech Communications: Human and Machine , 2012 .

[27]  J. P. Madden The role of frequency resolution and temporal resolution in the detection of frequency modulation. , 1994, The Journal of the Acoustical Society of America.

[28]  C. Daniel Geisler,et al.  Mathematical Models of the Mechanics of the Inner Ear , 1976 .

[29]  Q. Summerfield,et al.  On the dissociation of spectral and temporal cues to the voicing distinction in initial stop consonants. , 1977, The Journal of the Acoustical Society of America.

[30]  B H Repp,et al.  Relative Amplitude of Aspiration Noise as a Voicing Cue for Syllable-Initial Stop Consonants , 1979, Language and speech.

[31]  B Gold,et al.  Parallel processing techniques for estimating pitch periods of speech in the time domain. , 1969, The Journal of the Acoustical Society of America.

[32]  Nina H. MacDonald Duration as a syntactic boundary cue in ambiguous sentences , 1976, ICASSP.

[33]  J. Pierrehumbert The perception of fundamental frequency declination. , 1979, The Journal of the Acoustical Society of America.

[34]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[35]  Campbell L. Searle,et al.  Time‐domain analysis of auditory‐nerve fiber firing rates , 1989 .

[36]  B.W. Dickinson,et al.  An introduction to statistical signal processing with applications , 1979, Proceedings of the IEEE.

[37]  D. D. Greenwood Critical Bandwidth and the Frequency Coordinates of the Basilar Membrane , 1961 .

[38]  C V Pavlovic,et al.  An evaluation of some assumptions underlying the articulation index. , 1984, The Journal of the Acoustical Society of America.

[39]  D. Klatt,et al.  Discrimination of fundamental frequency contours in synthetic speech: implications for models of pitch perception. , 1973, The Journal of the Acoustical Society of America.

[40]  Li Deng Autosegmental Representation of Phonological Units of Speech and its Phonetic Interface , 1997 .

[41]  Mari Ostendorf,et al.  Moving beyond the 'beads-on-a-string' model of speech , 1999 .

[42]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[43]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[44]  L D Braida,et al.  Consistency among speech parameter vectors: application to predicting speech intelligibility. , 1996, The Journal of the Acoustical Society of America.

[45]  Li Deng,et al.  A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech , 2000, Comput. Speech Lang..

[46]  B. Repp Categorical Perception: Issues, Methods, Findings , 1984 .

[47]  Gerald Langner,et al.  Periodicity coding in the auditory system , 1992, Hearing Research.

[48]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[49]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[50]  Ronald J. MacGregor,et al.  Neural and brain modeling , 1987 .

[51]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[52]  Pascal Perrier,et al.  Compensation strategies for the perturbation of the rounded vowel [u] using a lip-tube : A study of the control space in speech production , 1995 .

[53]  John H. L. Hansen,et al.  An auditory-based distortion measure with application to concatenative speech synthesis , 1998, IEEE Trans. Speech Audio Process..

[54]  M. Sondhi,et al.  New methods of pitch extraction , 1968 .

[55]  Martin J. Russell,et al.  Speech recognition using a linear dynamic segmental HMM , 1995, EUROSPEECH.

[56]  S. Shamma Speech processing in the auditory system. I: The representation of speech sounds in the responses of the auditory nerve. , 1985, The Journal of the Acoustical Society of America.

[57]  Keikichi Hirose,et al.  Robust speech recognition based on a Bayesian prediction approach , 1999, IEEE Trans. Speech Audio Process..

[58]  Li Deng,et al.  Computational Models for Auditory Speech Processing , 1999 .

[59]  H M Sussman,et al.  An investigation of stop place of articulation as a function of syllable position: a locus equation perspective. , 1997, The Journal of the Acoustical Society of America.

[60]  Michael I. Jordan,et al.  Goal-based speech motor control: A theoretical framework and some preliminary data , 1995 .

[61]  R Meddis,et al.  A computer model of a cochlear-nucleus stellate cell: responses to amplitude-modulated and pure-tone stimuli. , 1992, The Journal of the Acoustical Society of America.

[62]  Yoshinori Sagisaka,et al.  Speech segment network approach for optimization of synthesis unit set , 1995, Comput. Speech Lang..

[63]  Sailes K. Sengijpta Fundamentals of Statistical Signal Processing: Estimation Theory , 1995 .

[64]  L. Carney,et al.  A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression. , 2001, The Journal of the Acoustical Society of America.

[65]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[66]  Li Deng,et al.  An overlapping-feature-based phonological model incorporating linguistic constraints: applications to speech recognition. , 2002, The Journal of the Acoustical Society of America.

[67]  Björn Lindblom,et al.  Explaining Phonetic Variation: A Sketch of the H&H Theory , 1990 .

[68]  Kirsten K. Osen,et al.  Anatomy of the Mammalian Cochlear Nuclei; a Review , 1988 .

[69]  J. R. Cox,et al.  A Mathematical Model of the Mechanics of the Cochlea , 1974 .

[70]  Frank H. Guenther,et al.  A MODELING FRAMEWORK FOR SPEECH MOTOR DEVELOPMENT AND KINEMATIC ARTICULATOR CONTROL , 1995 .

[71]  Steve J. Young,et al.  Tree-Based State Tying for High Accuracy Modelling , 1994, HLT.

[72]  Martin J. Russell,et al.  Probabilistic-trajectory segmental HMMs , 1999, Comput. Speech Lang..

[73]  M. Hallet,et al.  Speech Recognition: A Model and a Program for Research* , 1998 .

[74]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[75]  D. Recasens,et al.  Place cues for nasal consonants with special reference to Catalan. , 1983, The Journal of the Acoustical Society of America.

[76]  L. A. Westerman,et al.  A diffusion model of the transient response of the cochlear inner hair cell synapse. , 1988, The Journal of the Acoustical Society of America.

[77]  Mari Ostendorf,et al.  The use of prosody in syntactic disambiguation , 1991 .

[78]  Masaaki Honda,et al.  A model of articulator trajectory formation based on the motor tasks of vocal‐tract shapes , 1996 .

[79]  Li Deng,et al.  Improved speech modeling and recognition using multi-dimensional articulatory states as primitive speech units , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[80]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[81]  A. J. Watkins,et al.  Effects of spectral contrast on perceptual compensation for spectral-envelope distortion. , 1996, The Journal of the Acoustical Society of America.

[82]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[83]  B. Kröger,et al.  A gesture‐based dynamic model describing articulatory movement data , 1995 .

[84]  Herbert Gish,et al.  A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[85]  M. Studdert-Kennedy,et al.  Theoretical notes. Motor theory of speech perception: a reply to Lane's critical review. , 1970, Psychological review.

[86]  Hisashi Tanizaki,et al.  Nonlinear filters , 1993 .

[87]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[88]  W Jassem,et al.  Acoustic Correlates of Stress , 1965, Language and speech.

[89]  Thomas F. Quatieri,et al.  Shape invariant time-scale and pitch modification of speech , 1992, IEEE Trans. Signal Process..

[90]  Roy D. Patterson,et al.  A FUNCTIONAL MODEL OF NEURAL ACTIVITY PATTERNS AND AUDITORY IMAGES , 2004 .

[91]  J. Mendel Lessons in Estimation Theory for Signal Processing, Communications, and Control , 1995 .

[92]  A R Palmer,et al.  Temporal responses of primarylike anteroventral cochlear nucleus units to the steady-state vowel /i/. , 1990, The Journal of the Acoustical Society of America.

[93]  Li Deng,et al.  A mixture linear model with target-directed dynamics for spontaneous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[94]  S. Öhman Coarticulation in VCV Utterances: Spectrographic Measurements , 1966 .

[95]  Lou Boves,et al.  Designing control rules for a serial pole-zero vocal tract model , 1993, EUROSPEECH.

[96]  J Hillenbrand,et al.  Identification of steady-state vowels synthesized from the Peterson and Barney measurements. , 1993, The Journal of the Acoustical Society of America.

[97]  A J van Hessen,et al.  Modeling phoneme perception. I: Categorical perception. , 1992, The Journal of the Acoustical Society of America.

[98]  D. Klatt Review of selected models of speech perception , 1989 .

[99]  Steven Greenberg,et al.  Computational Models of Auditory Function , 2001 .

[100]  L. Rabiner,et al.  Effects of smoothing and quantizing the parameters of formant-coded voiced speech. , 1971, The Journal of the Acoustical Society of America.

[101]  M. Scheffers,et al.  Discrimination of fundamental frequency of synthesized vowel sounds in a noise background. , 1984, The Journal of the Acoustical Society of America.

[102]  Jae Lim,et al.  Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise , 1978 .

[103]  Li Deng,et al.  Articulatory Features and Associated Production Models Statistical Speech Recognition , 1999 .

[104]  Man Mohan Sondhi,et al.  Techniques for estimating vocal-tract shapes from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[105]  Hamid Sheikhzadeh,et al.  Interval statistics generated from a cochlear model in response to speech sounds , 1994 .

[106]  D. Massaro,et al.  Evaluation and integration of acoustic features in speech perception. , 1980, The Journal of the Acoustical Society of America.

[107]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[108]  J. Mariani,et al.  Recent advances in speech processing , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[109]  S. Hiki Control Rule of the Tongue Movement for Dynamic Analog Speech Synthesis , 1970 .

[110]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[111]  Li Deng,et al.  Data-driven model construction for continuous speech recognition using overlapping articulatory features , 2000, INTERSPEECH.

[112]  M. Sachs,et al.  Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers. , 1979, The Journal of the Acoustical Society of America.

[113]  K. Honda Organization of tongue articulation for vowels , 1996 .

[114]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[115]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[116]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[117]  Hamid Sheikhzadeh,et al.  A layered neural network interfaced with a cochlear model for the study of speech encoding in the auditory system , 1999, Comput. Speech Lang..

[118]  W. S. Rhode,et al.  A composite model of the auditory periphery for the processing of speech based on the filter response functions of single auditory-nerve fibers. , 1991, The Journal of the Acoustical Society of America.

[119]  E. Zwicker Dependence of post-masking on masker duration and its relation to temporal effects in loudness. , 1984, The Journal of the Acoustical Society of America.

[120]  Richard S. McGowan,et al.  Recovering articulatory movement from formant frequency trajectories using task dynamics and a genetic algorithm: Preliminary model tests , 1994, Speech Commun..

[121]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[122]  B Hagerman,et al.  Clinical measurements of speech reception threshold in noise. , 1984, Scandinavian audiology.

[123]  Dick R. van Bergem,et al.  A model of coarticulatory effects on the schwa , 1994, Speech Commun..

[124]  S. A. Shamma The Auditory Processing of Speech. , 1986 .

[125]  A K Nábĕlek,et al.  Perception of nonlinear and linear formant trajectories. , 1997, The Journal of the Acoustical Society of America.

[126]  Raymond D. Kent,et al.  Acoustic features of infant vocalic utterances at 3, 6, and 9 months. , 1982, The Journal of the Acoustical Society of America.

[127]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[128]  Kuldip K. Paliwal,et al.  A comparative performance evaluation of pitch estimation methods for TDHS/sub-band coding of speech , 1984, Speech Commun..

[129]  Li Deng,et al.  A robust compensation strategy for extraneous acoustic variations in spontaneous speech recognition , 2002, IEEE Trans. Speech Audio Process..

[130]  R. S. McGowan,et al.  Acoustic 1996: Speech production parameters for automatic speech recognition , 1997 .

[131]  Shihab Shamma,et al.  Auditory Representations of Timbre and Pitch , 1996 .

[132]  P Howell,et al.  Production and perception of rise time in the voiceless affricate/fricative distinction. , 1983, The Journal of the Acoustical Society of America.

[133]  Lawrence R. Rabiner,et al.  Speech synthesis by rule: An acoustic domain approach , 1968 .

[134]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[135]  S. Young Large Vocabulary Continuous Speech Recognition : a ReviewSteve , 1996 .

[136]  R. Patterson,et al.  Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[137]  Oded Ghitza Auditory models and human performance in tasks related to speech coding and speech recognition , 1994 .

[138]  S. Wood A radiographic analysis of constriction locations for vowels , 1979 .

[139]  Eric D. Young,et al.  Response properties of type II and type III units in dorsal cochlear nucleus , 1982, Hearing Research.

[140]  R. L. Smith,et al.  Adaptation in auditory-nerve fibers: A revised model , 1982, Biological Cybernetics.

[141]  S.E. Levinson,et al.  Structural methods in automatic speech recognition , 1985, Proceedings of the IEEE.

[142]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[143]  D B Pisoni,et al.  Segmental intelligibility of synthetic speech produced by rule. , 1989, The Journal of the Acoustical Society of America.

[144]  Dj Dik Hermes Timing of pitch movements and accentuation of syllables in Dutch , 1997 .

[145]  Kuansan Wang,et al.  Self-normalization and noise-robustness in early auditory representations , 1994, IEEE Trans. Speech Audio Process..

[146]  E. Young,et al.  Responses to tones and noise of single cells in dorsal cochlear nucleus of unanesthetized cats. , 1976, Journal of neurophysiology.

[147]  P. Woodland,et al.  A computational model of the auditory periphery for speech and hearing research. II. Descending paths. , 1994, The Journal of the Acoustical Society of America.

[148]  M M Sondhi,et al.  The potential role of speech production models in automatic speech recognition. , 1996, The Journal of the Acoustical Society of America.

[149]  E D Young,et al.  Organization of dorsal cochlear nucleus type IV unit response maps and their relationship to activation by bandlimited noise. , 1991, Journal of neurophysiology.

[150]  Thomas P. Barnwell,et al.  Objective measures for speech quality testing , 1978 .

[151]  M. Swamy,et al.  High resolution formant extraction from linear-prediction phase spectra , 1984 .

[152]  J. L. Hall,et al.  Model for mechanical to neural transduction in the auditory receptor. , 1974, The Journal of the Acoustical Society of America.

[153]  D. Whalen The Motor Theory of Speech Perception , 2019, Oxford Research Encyclopedia of Linguistics.

[154]  Li Deng,et al.  Production models as a structural basis for automatic speech recognition , 1997, Speech Commun..

[155]  J. R. Resnick,et al.  The inverse problem for the vocal tract: numerical methods, acoustical experiments, and speech synthesis. , 1983, The Journal of the Acoustical Society of America.

[156]  Thomas Baer,et al.  An articulatory synthesizer for perceptual research , 1978 .

[157]  Chin-W. Kim,et al.  Models of Speech Production , 1972, Formal Aspects of Cognitive Processes.

[158]  W. S. Rhode,et al.  Physiological studies on neurons in the dorsal cochlear nucleus of cat. , 1986, Journal of neurophysiology.

[159]  Yannis Stylianou Removing linear phase mismatches in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[160]  P Ladefoged,et al.  Individual differences in vowel production. , 1993, The Journal of the Acoustical Society of America.

[161]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[162]  S. Katagiri,et al.  Discriminative Learning for Minimum Error Classification , 2009 .

[163]  Xuemin Shen,et al.  Maximum likelihood in statistical estimation of dynamic systems: Decomposition algorithm and simulation results , 1997, Signal Process..

[164]  Andrej Ljolje,et al.  Automatic segmentation of speech for TTS , 1993, EUROSPEECH.

[165]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[166]  S. Shamma Speech processing in the auditory system. II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve. , 1985, The Journal of the Acoustical Society of America.

[167]  Shihab A. Shamma Spatial and temporal processing in central auditory networks , 1989 .

[168]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[169]  V C Tartter,et al.  Hearing smiles and frowns in normal and whisper registers. , 1994, The Journal of the Acoustical Society of America.

[170]  Allyn E. Hubbard,et al.  Analysis and Synthesis of Cochlear Mechanical Function Using Models , 1996 .

[171]  M. Sachs,et al.  Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. , 1979, The Journal of the Acoustical Society of America.

[172]  Hamid Sheikhzadeh,et al.  Speech analysis and recognition using interval statistics generated from a composite auditory model , 1998, IEEE Trans. Speech Audio Process..

[173]  Hynek Hermansky,et al.  Should recognizers have ears? , 1998, Speech Commun..

[174]  Boaz Porat,et al.  A course in digital signal processing , 1996 .

[175]  Simon King,et al.  Speech recognition via phonetically featured syllables , 1998, ICSLP.

[176]  S. Neely Finite difference solution of a two-dimensional mathematical model of the cochlea. , 1981, The Journal of the Acoustical Society of America.

[177]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[178]  R. N. Ohde,et al.  Physiologic, Acoustic, and Perceptual Aspects of Coarticulation: Implications for the Remediation of Articulatory Disorders , 1981 .

[179]  Kenneth N. Stevens,et al.  On the quantal nature of speech , 1972 .

[180]  C D Geisler,et al.  Responses of auditory-nerve fibers to consonant-vowel syllables. , 1981, The Journal of the Acoustical Society of America.

[181]  Tatsuya Hirahara,et al.  Auditory front end in DTW word recognition under noisy, reverberant, and multispeaker conditions. , 1991 .

[182]  E Paulus,et al.  Automatic speech recognition using psychoacoustic models. , 1979, The Journal of the Acoustical Society of America.

[183]  G. E. Peterson,et al.  Duration of Syllable Nuclei in English , 1960 .

[184]  L Deng,et al.  Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics. , 2000, The Journal of the Acoustical Society of America.

[185]  M. Paez,et al.  Minimum Mean-Squared-Error Quantization in Speech PCM and DPCM Systems , 1972, IEEE Trans. Commun..

[186]  D. Recasens,et al.  A model of lingual coarticulation based on articulatory constraints , 1997 .

[187]  S E Blumstein,et al.  Further evidence of acoustic invariance in speech production: the stop-glide contrast. , 1983, The Journal of the Acoustical Society of America.

[188]  Nam-Soo Kim Nonstationary environment compensation based on sequential estimation , 1998 .

[189]  D Kewley-Port,et al.  Modeling formant frequency discrimination of female vowels. , 1996, The Journal of the Acoustical Society of America.

[190]  V.W. Zue,et al.  The use of speech knowledge in automatic speech recognition , 1985, Proceedings of the IEEE.

[191]  Vladimir Pavlovic,et al.  Dynamic bayesian networks for information fusion with applications to human-computer interfaces , 1999 .

[192]  A. Huggins,et al.  Just noticeable differences for segment duration in natural speech. , 1969, The Journal of the Acoustical Society of America.

[193]  Li Deng,et al.  A Bayesian approach to speech feature enhancement using the dynamic cepstral prior , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[194]  W. Klein,et al.  Vowel spectra, vowel spaces, and vowel identification. , 1970, The Journal of the Acoustical Society of America.

[195]  C. L. Searle,et al.  Time-domain analysis of auditory-nerve-fiber firing rates. , 1990, The Journal of the Acoustical Society of America.

[196]  D.R. Reddy,et al.  Speech recognition by machine: A review , 1976, Proceedings of the IEEE.

[197]  R I Damper,et al.  A computational model of afferent neural activity from the cochlea to the dorsal acoustic stria. , 1991, The Journal of the Acoustical Society of America.

[198]  H. Voigt,et al.  Evidence of inhibitory interactions between neurons in dorsal cochlear nucleus. , 1980, Journal of neurophysiology.

[199]  M M Sondhi Resonances of a bent vocal tract. , 1986, The Journal of the Acoustical Society of America.

[200]  M. Sachs,et al.  Representation of stop consonants in the discharge patterns of auditory-nerve fibers. , 1983, The Journal of the Acoustical Society of America.

[201]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[202]  Richard F. Lyon,et al.  On the importance of time—a temporal representation of sound , 1993 .

[203]  H. Voigt,et al.  Cross-correlation analysis of inhibitory interactions in dorsal cochlear nucleus. , 1990, Journal of neurophysiology.

[204]  Singiresu S. Rao,et al.  Optimization Theory and Applications , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[205]  Darragh O'Brien,et al.  Concatenative synthesis based on a harmonic model , 2001, IEEE Trans. Speech Audio Process..

[206]  W. S. Rhode,et al.  The use of intracellular techniques in the study of the cochlear nucleus. , 1985, The Journal of the Acoustical Society of America.

[207]  Aaron E. Rosenberg,et al.  A subjective evaluation of pitch detection methods using LPC synthesized speech , 1977 .

[208]  E. F. Evans,et al.  The Dynamic Range Problem: Place and Time Coding at the Level of Cochlear Nerve and Nucleus , 1981 .

[209]  Li Deng,et al.  Speech trajectory discrimination using the minimum classification error learning , 1998, IEEE Trans. Speech Audio Process..

[210]  Hamid Sheikhzadeh,et al.  Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization , 1994, IEEE Trans. Speech Audio Process..

[211]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[212]  I. Pollack,et al.  Intelligibility of Excerpts from Conversation , 1963 .

[213]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[214]  J. L. Miller,et al.  A distinction between the effects of sentential speaking rate and semantic congruity on word identification , 1984, Perception & psychophysics.

[215]  M. Haggard,et al.  Pitch as a voicing cue. , 1970, The Journal of the Acoustical Society of America.

[216]  S. N. Jagannathan Handbook of Sensory Physiology: Auditory System , 1978 .

[217]  L. Streeter,et al.  Acoustic and perceptual indicators of emotional stress. , 1983, The Journal of the Acoustical Society of America.

[218]  Patricia A. Keating,et al.  Papers in Laboratory Phonology: The window model of coarticulation: articulatory evidence , 1990 .

[219]  Partha Niyogi Modelling Speaker Variability and Imposing Speaker Constraints in Phonetic Classification , 1992 .

[220]  Israel Nelken,et al.  Nonlinearity of spectra processing in the dorsal cochlear nucleus (DCN) , 1993 .

[221]  Li Deng,et al.  A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics , 1999, EUROSPEECH.

[222]  Laurel H. Carney,et al.  Evaluating Auditory Performance Limits: I. One-Parameter Discrimination Using a Computational Model for the Auditory Nerve , 2001, Neural Computation.

[223]  R. Shumway,et al.  Dynamic linear models with switching , 1991 .

[224]  R. Smits,et al.  Evaluation of various sets of acoustic cues for the perception of prevocalic stop consonants. II. Modeling and evaluation. , 1996, The Journal of the Acoustical Society of America.

[225]  Katrin Kirchhoff Syllable-level desynchronisation of phonetic features for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[226]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[227]  Patrick Kenny,et al.  A linear predictive HMM for vector-valued observations with applications to speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[228]  Herbert Voigt,et al.  The Internal Organization of the Dorsal Cochlear Nucleus , 1981 .

[229]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[230]  D. O'Shaughnessy,et al.  Linguistic modality effects on fundamental frequency in speech. , 1983, The Journal of the Acoustical Society of America.

[231]  L A Chistovich,et al.  Auditory Processing of Speech , 1980, Language and speech.

[232]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.

[233]  Richard E. Pastore,et al.  Temporal order identification: Some parameter dependencies , 1982 .

[234]  Robert C. Moore Using Natural-Language Knowledge Sources in Speech Recognition , 1999 .

[235]  M E Schouten,et al.  The case against a speech mode of perception. , 1980, Acta psychologica.

[236]  H. Lane,et al.  Speech deterioration in postlingually deafened adults. , 1991, The Journal of the Acoustical Society of America.

[237]  R L Diehl,et al.  Identifying vowels in CVC syllables: effects of inserting silence and noise. , 1981, Perception & psychophysics.

[238]  Richard Sproat,et al.  Multilingual Text-to-Speech Synthesis: The Bell Labs Approach , 1998, CL.

[239]  R. Meddis,et al.  A computer model of amplitude-modulation sensitivity of single units in the inferior colliculus. , 1994, The Journal of the Acoustical Society of America.

[240]  Hermann Ney,et al.  Progress in dynamic programming search for LVCSR , 2000 .

[241]  S. Greenberg Representation of Speech in the Auditory Periphery , 1988 .

[242]  I. Lehiste,et al.  Role of duration in disambiguating syntactically ambiguous sentences , 1975 .

[243]  Kim E. A. Silverman,et al.  Evidence for the independent function of intonation contour type, voice quality, and F0 range in signaling speaker affect , 1985 .

[244]  Masaaki Honda,et al.  A dynamical articulatory model using potential task representation , 1994, ICSLP.

[245]  R. N. Ohde,et al.  Effect of relative amplitude of frication on perception of place of articulation. , 1991, The Journal of the Acoustical Society of America.

[246]  J. Perkell,et al.  Invariance and variability in speech processes , 1987 .

[247]  A Robert,et al.  A composite model of the auditory periphery for simulating responses to complex sounds. , 1999, The Journal of the Acoustical Society of America.

[248]  Hsiao-Wuen Hon,et al.  Unified frame and segment based models for automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[249]  A M Liberman,et al.  Perception of the speech code. , 1967, Psychological review.

[250]  John S. Bridle,et al.  The HDM: a segmental hidden dynamic model of coarticulation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[251]  Li Deng,et al.  Optimization of dynamic regimes in a statistical hidden dynamic model for conversational speech recognition , 1999, EUROSPEECH.

[252]  S. Haykin,et al.  Pattern Recognition Using a Family of Design Algorithms Based upon the Generalized Probabilistic Descent Method , 2001 .

[253]  Aaron E. Rosenberg,et al.  A comparative performance study of several pitch detection algorithms , 1976 .

[254]  Elliot Saltzman,et al.  The dynamical perspectives on speech production: Data and theory , 1986 .

[255]  William L. Henke,et al.  Dynamic articulatory model of speech production using computer simulation , 1966 .

[256]  K. Stevens,et al.  Feature geometry and the vocal tract , 1994, Phonology.

[257]  Chin-Hui Lee,et al.  Bayesian Adaptive Learning and Map Estimation of HMM , 1996 .

[258]  Li Deng,et al.  Nonstationary-state hidden Markov model representation of speech signals for speech enhancement , 2002, Signal Process..

[259]  Li Deng,et al.  Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions , 1997, IEEE Trans. Speech Audio Process..

[260]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[261]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[262]  R H Wilson,et al.  Word recognition with segmented-alternated CVC words: a preliminary report on listeners with normal hearing. , 1984, Journal of speech and hearing research.

[263]  M.R. Schroeder,et al.  Models of hearing , 1975, Proceedings of the IEEE.

[264]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[265]  Helen Meng,et al.  The Use of Distinctive Features for Automatic Speech Recognition , 1991 .

[266]  Chilin Shih,et al.  Bell laboratories Russian text-to-speech system , 1997, EUROSPEECH.

[267]  Li Deng,et al.  High-performance robust speech recognition using stereo training data , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[268]  L D Braida,et al.  Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing. , 1994, The Journal of the Acoustical Society of America.

[269]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[270]  D. W. Thomas Linear Prediction of Speech, J.D. Markel, A.H. Gray. Springer-Verlag, Berlin, Heidelberg, New York (1976), xii+288 pp. Cloth; price DM 73,00; U.S. $30.00, ISBN: 3-540-07563-1 , 1977 .

[271]  Mari Ostendorf,et al.  A dynamical system model for generating fundamental frequency for speech synthesis , 1999, IEEE Trans. Speech Audio Process..

[272]  Martin Russell,et al.  A segmental HMM for speech pattern modelling , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[273]  Saito,et al.  Fundamentals of Speech Signal Processing , 1986 .

[274]  Robert M. Gray,et al.  Toeplitz and Circulant Matrices: A Review , 2005, Found. Trends Commun. Inf. Theory.

[275]  Dennis H. Klatt,et al.  Perception of Segment Duration in Sentence Contexts , 1975 .

[276]  K. Payton Vowel processing by a model of the auditory periphery: A comparison to eighth‐nerve responses , 1988 .

[277]  Katrin Kirchhoff,et al.  Robust speech recognition using articulatory information , 1998 .

[278]  B. Lindblom,et al.  Role of articulation in speech perception: clues from production. , 1996, The Journal of the Acoustical Society of America.

[279]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[280]  P. Mermelstein Articulatory model for the study of speech production. , 1973, The Journal of the Acoustical Society of America.

[281]  D O Kim,et al.  Spatial response profiles of posteroventral cochlear nucleus neurons and auditory-nerve fibers in unanesthetized decerebrate cats: response to pure tones. , 1991, The Journal of the Acoustical Society of America.

[282]  R. B. Monsen,et al.  The accuracy of formant frequency measurements: a comparison of spectrographic analysis and linear prediction. , 1983, Journal of speech and hearing research.

[283]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[284]  Li Deng,et al.  Integrated-multilingual speech recognition using universal phonological features in a functional speech production model , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[285]  Li Deng,et al.  A Markov model containing state-conditioned second-order non-stationarity: application to speech recognition , 1995, Comput. Speech Lang..

[286]  Alex Acero,et al.  Automatic generation of synthesis units for trainable text-to-speech systems , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[287]  Yannis Stylianou,et al.  TD-PSOLA versus harmonic plus noise model in diphone based speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[288]  Philip N. Garner,et al.  Using formant frequencies in speech recognition , 1997, EUROSPEECH.

[289]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[290]  J L Miller,et al.  The influence of sentential speaking rate on the internal structure of phonetic categories. , 1994, The Journal of the Acoustical Society of America.

[291]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[292]  Hamid Sheikhzadeh,et al.  Comparative performance of spectral subtraction and HMM-based speech enhancement strategies with application to hearing and design , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[293]  J Hillenbrand,et al.  Perception of the voiced-voiceless contrast in syllable-final stops. , 1984, The Journal of the Acoustical Society of America.

[294]  Enrico Mugnaini,et al.  Neuronal Circuits in the Dorsal Cochlear Nucleus , 1981 .

[295]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[296]  Hamid Sheikhzadeh,et al.  HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[297]  Hamid Sheikhzadeh,et al.  Real-time speech synthesis on an ultra low-resource, programmable DSP system , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[298]  Li Deng,et al.  Computational Models for Speech Production , 2018, Speech Processing.

[299]  H. Sussman,et al.  An investigation of locus equations as a source of relational invariance for stop place categorization , 1991 .

[300]  John E. Shore,et al.  Discrete utterance speech recognition without time alignment , 1983, IEEE Trans. Inf. Theory.

[301]  J J Jenkins,et al.  Vowel identification in mixed-speaker silent-center syllables. , 1994, The Journal of the Acoustical Society of America.

[302]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[303]  C. L. Searle,et al.  Stop consonant discrimination based on human audition. , 1979, The Journal of the Acoustical Society of America.

[304]  Li Deng,et al.  A maximum a posteriori approach to speaker adaptation using the trended hidden Markov model , 2001, IEEE Trans. Speech Audio Process..

[305]  A. Liberman,et al.  Some Cues for the Distinction Between Voiced and Voiceless Stops in Initial Position , 1957 .

[306]  Raymond D. Kent,et al.  chapter 3 – Models of Speech Production , 1976 .

[307]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[308]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[309]  Martin J. Russell,et al.  Modeling speech variability with segmental HMMs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[310]  S. Soli,et al.  Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. , 1994, The Journal of the Acoustical Society of America.

[311]  D. Klatt,et al.  Structure of a phonological rule component for a synthesis-by-rule program , 1976 .

[312]  Paul Mermelstein,et al.  Difference limens for formant frequencies of steady‐state and consonant‐bound vowels , 1976 .

[313]  M. Sachs,et al.  Rate-place and temporal-place representations of vowels in the auditory nerve and anteroventral cochlear nucleus , 1988 .

[314]  J L Miller Nonindependence of feature processing in initial consonants. , 1977, Journal of speech and hearing research.

[315]  Li Deng,et al.  A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition , 1998, Speech Commun..

[316]  S. McCandless,et al.  An algorithm for automatic formant extraction using linear prediction spectra , 1974 .

[317]  Max A. Viergever,et al.  Mechanics of the inner ear: A mathematical approach , 1980 .

[318]  Laurel H. Carney,et al.  Evaluating Auditory Performance Limits: II. One-Parameter Discrimination with Random-Level Variation , 2001, Neural Computation.

[319]  Li Deng,et al.  A Bayesian Approach to Speaker Verification , 2001 .

[320]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[321]  Li Deng,et al.  HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features , 1997, IEEE Trans. Speech Audio Process..

[322]  B. Moore,et al.  Frequency and intensity difference limens for harmonics within complex tones. , 1984, The Journal of the Acoustical Society of America.

[323]  Y Xu,et al.  Production and perception of coarticulated tones. , 1994, The Journal of the Acoustical Society of America.

[324]  J. 't Hart,et al.  Discriminability of the size of pitch movements in speech , 1974 .

[325]  D. Ladd,et al.  Declination.: a review and some hypotheses , 1984, Phonology Yearbook.

[326]  Li Deng,et al.  Initial evaluation of hidden dynamic models on conversational speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[327]  Mandy Eberhart,et al.  Speech Communications Human And Machine , 2016 .

[328]  Oded Ghitza,et al.  Hidden Markov models with templates as non-stationary states: an application to speech recognition , 1993, Comput. Speech Lang..

[329]  D. O'Shaughnessy Consonant durations in clusters , 1974 .

[330]  F Rattay,et al.  The mammalian auditory hair cell: a simple electric circuit model. , 1998, The Journal of the Acoustical Society of America.

[331]  Stefanie Shattuck-Hufnagel,et al.  Implementation of a model for lexical access based on features , 1992, ICSLP.

[332]  Chin-Hui Lee,et al.  On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate , 1997, IEEE Trans. Speech Audio Process..

[333]  Li Deng,et al.  Transitional speech units and their representation by regressive Markov states: applications to speech recognition , 1996, IEEE Trans. Speech Audio Process..

[334]  A. Liberman,et al.  Tempo of frequency change as a cue for distinguishing classes of speech sounds. , 1956, Journal of experimental psychology.

[335]  B. Lindblom,et al.  Interaction between duration, context, and speaking style in English stressed vowels , 1994 .

[336]  W. Brownell,et al.  Synaptic organization of eighth nerve afferents to cat dorsal cochlear nucleus. , 1983, Journal of neurophysiology.

[337]  Antonio M. Peinado,et al.  Model-based compensation of the additive noise for continuous speech recognition. experiments using the Aurora II database and tasks , 2001, INTERSPEECH.

[338]  A. Liberman,et al.  The motor theory of speech perception revised , 1985, Cognition.

[339]  M. Halle,et al.  Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates , 1961 .

[340]  John Hart,et al.  A Perceptual Study of Intonation , 1990 .