论文信息 - Speech Processing

Speech Processing

Analytical background and techniques: discrete-time signals, systems and transforms analysis of discrete-time speech signals probability and random processes linear model and dynamic system model optimization methods and estimation theory statistical pattern recognition. Fundamentals of speech science: phonetic process phonological process. Computational phonology and phonetics: computational phonology computational models for speech production computational models for auditory speechprocessing. Speech technology in selected areas: speech recognition speech enhancement speech synthesis.

Douglas D. O'Shaughnessy | Li Deng | D. O'Shaughnessy | Li Deng

[1] Keikichi Hirose,et al. A minimax search algorithm for CDHMM based robust continuous speech recognition , 1998, ICSLP.

[2] M. Liberman. The cochlear frequency map for the cat: labeling auditory-nerve fibers of known characteristic frequency. , 1982, The Journal of the Acoustical Society of America.

[3] L. Joseph,et al. Bayesian Statistics: An Introduction , 1989 .

[4] B.-H. Juang,et al. On the hidden Markov model and dynamic time warping for speech recognition — A unified view , 1984, AT&T Bell Laboratories Technical Journal.

[5] Biing-Hwang Juang,et al. Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[6] D. Ostry,et al. The equilibrium point hypothesis and its application to speech motor control. , 1996, Journal of speech and hearing research.

[7] Don McAllaster,et al. Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch , 1998, ICSLP.

[8] G. P. Moore,et al. Neuronal spike trains and stochastic point processes. II. Simultaneous spike trains. , 1967, Biophysical journal.

[9] R S McGowan,et al. Task dynamic and articulatory recovery of lip and velar approximations under model mismatch conditions. , 1996, The Journal of the Acoustical Society of America.

[10] A. Nuttall. Some windows with very good sidelobe behavior , 1981 .

[11] K. Stevens. Airflow and Turbulence Noise for Fricative and Stop Consonants: Static Considerations , 1971 .

[12] Ernst Terhardt,et al. Facts and Models in Hearing , 1974 .

[13] Michael W. Macon,et al. Control of spectral dynamics in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[14] Steven Greenberg,et al. Auditory Processing of Speech , 2006 .

[15] Katsuhiko Shirai,et al. ARTICULATORY MODEL AND THE ESTIMATION OF ARTICULATORY PARAMETERS BY NONLINEAR REGRESSION METHOD. , 1976 .

[16] R. Ohba,et al. Pole-zero analysis of voiced speech using group delay characteristics , 1984 .

[17] Chang‐Jin Kim,et al. Dynamic linear models with Markov-switching , 1994 .

[18] J. Kelso,et al. Functionally specific articulatory cooperation following jaw perturbations during speech: evidence for coordinative structures. , 1984, Journal of experimental psychology. Human perception and performance.

[19] Dexter R. F. Irvine,et al. Auditory Brainstem Processing: Integration and Conclusions , 1986 .

[20] C Giguère,et al. A computational model of the auditory periphery for speech and hearing research. I. Ascending path. , 1994, The Journal of the Acoustical Society of America.

[21] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[22] Biing-Hwang Juang,et al. Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[23] Sanjit K. Mitra,et al. Digital Signal Processing: A Computer-Based Approach , 1997 .

[24] Naftali Z. Tisby. On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[25] Raimond L. Winslow,et al. Some Aspects of Rate Coding in the Auditory Nerve , 1986 .

[26] Douglas D. O'Shaughnessy. Speech Communications: Human and Machine , 2012 .

[27] J. P. Madden. The role of frequency resolution and temporal resolution in the detection of frequency modulation. , 1994, The Journal of the Acoustical Society of America.

[28] C. Daniel Geisler,et al. Mathematical Models of the Mechanics of the Inner Ear , 1976 .

[29] Q. Summerfield,et al. On the dissociation of spectral and temporal cues to the voicing distinction in initial stop consonants. , 1977, The Journal of the Acoustical Society of America.

[30] B H Repp,et al. Relative Amplitude of Aspiration Noise as a Voicing Cue for Syllable-Initial Stop Consonants , 1979, Language and speech.

[31] B Gold,et al. Parallel processing techniques for estimating pitch periods of speech in the time domain. , 1969, The Journal of the Acoustical Society of America.

[32] Nina H. MacDonald. Duration as a syntactic boundary cue in ambiguous sentences , 1976, ICASSP.

[33] J. Pierrehumbert. The perception of fundamental frequency declination. , 1979, The Journal of the Acoustical Society of America.

[34] J. Hillenbrand,et al. Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[35] Campbell L. Searle,et al. Time‐domain analysis of auditory‐nerve fiber firing rates , 1989 .

[36] B.W. Dickinson,et al. An introduction to statistical signal processing with applications , 1979, Proceedings of the IEEE.

[37] D. D. Greenwood. Critical Bandwidth and the Frequency Coordinates of the Basilar Membrane , 1961 .

[38] C V Pavlovic,et al. An evaluation of some assumptions underlying the articulation index. , 1984, The Journal of the Acoustical Society of America.

[39] D. Klatt,et al. Discrimination of fundamental frequency contours in synthetic speech: implications for models of pitch perception. , 1973, The Journal of the Acoustical Society of America.

[40] Li Deng. Autosegmental Representation of Phonological Units of Speech and its Phonetic Interface , 1997 .

[41] Mari Ostendorf,et al. Moving beyond the 'beads-on-a-string' model of speech , 1999 .

[42] R. Gray,et al. Distortion measures for speech processing , 1980 .

[43] Iain R. Murray,et al. Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[44] L D Braida,et al. Consistency among speech parameter vectors: application to predicting speech intelligibility. , 1996, The Journal of the Acoustical Society of America.

[45] Li Deng,et al. A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech , 2000, Comput. Speech Lang..

[46] B. Repp. Categorical Perception: Issues, Methods, Findings , 1984 .

[47] Gerald Langner,et al. Periodicity coding in the auditory system , 1992, Hearing Research.

[48] T. W. Parsons. Separation of speech from interfering speech by means of harmonic selection , 1976 .

[49] F. Jelinek,et al. Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[50] Ronald J. MacGregor,et al. Neural and brain modeling , 1987 .

[51] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[52] Pascal Perrier,et al. Compensation strategies for the perturbation of the rounded vowel [u] using a lip-tube : A study of the control space in speech production , 1995 .

[53] John H. L. Hansen,et al. An auditory-based distortion measure with application to concatenative speech synthesis , 1998, IEEE Trans. Speech Audio Process..

[54] M. Sondhi,et al. New methods of pitch extraction , 1968 .

[55] Martin J. Russell,et al. Speech recognition using a linear dynamic segmental HMM , 1995, EUROSPEECH.

[56] S. Shamma. Speech processing in the auditory system. I: The representation of speech sounds in the responses of the auditory nerve. , 1985, The Journal of the Acoustical Society of America.

[57] Keikichi Hirose,et al. Robust speech recognition based on a Bayesian prediction approach , 1999, IEEE Trans. Speech Audio Process..

[58] Li Deng,et al. Computational Models for Auditory Speech Processing , 1999 .

[59] H M Sussman,et al. An investigation of stop place of articulation as a function of syllable position: a locus equation perspective. , 1997, The Journal of the Acoustical Society of America.

[60] Michael I. Jordan,et al. Goal-based speech motor control: A theoretical framework and some preliminary data , 1995 .

[61] R Meddis,et al. A computer model of a cochlear-nucleus stellate cell: responses to amplitude-modulated and pure-tone stimuli. , 1992, The Journal of the Acoustical Society of America.

[62] Yoshinori Sagisaka,et al. Speech segment network approach for optimization of synthesis unit set , 1995, Comput. Speech Lang..

[63] Sailes K. Sengijpta. Fundamentals of Statistical Signal Processing: Estimation Theory , 1995 .

[64] L. Carney,et al. A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression. , 2001, The Journal of the Acoustical Society of America.

[65] L. Rabiner,et al. An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[66] Li Deng,et al. An overlapping-feature-based phonological model incorporating linguistic constraints: applications to speech recognition. , 2002, The Journal of the Acoustical Society of America.

[67] Björn Lindblom,et al. Explaining Phonetic Variation: A Sketch of the H&H Theory , 1990 .

[68] Kirsten K. Osen,et al. Anatomy of the Mammalian Cochlear Nuclei; a Review , 1988 .

[69] J. R. Cox,et al. A Mathematical Model of the Mechanics of the Cochlea , 1974 .

[70] Frank H. Guenther,et al. A MODELING FRAMEWORK FOR SPEECH MOTOR DEVELOPMENT AND KINEMATIC ARTICULATOR CONTROL , 1995 .

[71] Steve J. Young,et al. Tree-Based State Tying for High Accuracy Modelling , 1994, HLT.

[72] Martin J. Russell,et al. Probabilistic-trajectory segmental HMMs , 1999, Comput. Speech Lang..

[73] M. Hallet,et al. Speech Recognition: A Model and a Program for Research* , 1998 .

[74] G. A. Miller,et al. An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[75] D. Recasens,et al. Place cues for nasal consonants with special reference to Catalan. , 1983, The Journal of the Acoustical Society of America.

[76] L. A. Westerman,et al. A diffusion model of the transient response of the cochlear inner hair cell synapse. , 1988, The Journal of the Acoustical Society of America.

[77] Mari Ostendorf,et al. The use of prosody in syntactic disambiguation , 1991 .

[78] Masaaki Honda,et al. A model of articulator trajectory formation based on the motor tasks of vocal‐tract shapes , 1996 .

[79] Li Deng,et al. Improved speech modeling and recognition using multi-dimensional articulatory states as primitive speech units , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[80] Hermann Ney,et al. A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[81] A. J. Watkins,et al. Effects of spectral contrast on perceptual compensation for spectral-envelope distortion. , 1996, The Journal of the Acoustical Society of America.

[82] L Saltzman Elliot,et al. A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[83] B. Kröger,et al. A gesture‐based dynamic model describing articulatory movement data , 1995 .

[84] Herbert Gish,et al. A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[85] M. Studdert-Kennedy,et al. Theoretical notes. Motor theory of speech perception: a reply to Lane's critical review. , 1970, Psychological review.

[86] Hisashi Tanizaki,et al. Nonlinear filters , 1993 .

[87] Aaron E. Rosenberg,et al. An improved endpoint detector for isolated word recognition , 1981 .

[88] W Jassem,et al. Acoustic Correlates of Stress , 1965, Language and speech.

[89] Thomas F. Quatieri,et al. Shape invariant time-scale and pitch modification of speech , 1992, IEEE Trans. Signal Process..

[90] Roy D. Patterson,et al. A FUNCTIONAL MODEL OF NEURAL ACTIVITY PATTERNS AND AUDITORY IMAGES , 2004 .

[91] J. Mendel. Lessons in Estimation Theory for Signal Processing, Communications, and Control , 1995 .

[92] A R Palmer,et al. Temporal responses of primarylike anteroventral cochlear nucleus units to the steady-state vowel /i/. , 1990, The Journal of the Acoustical Society of America.

[93] Li Deng,et al. A mixture linear model with target-directed dynamics for spontaneous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[94] S. Öhman. Coarticulation in VCV Utterances: Spectrographic Measurements , 1966 .

[95] Lou Boves,et al. Designing control rules for a serial pole-zero vocal tract model , 1993, EUROSPEECH.

[96] J Hillenbrand,et al. Identification of steady-state vowels synthesized from the Peterson and Barney measurements. , 1993, The Journal of the Acoustical Society of America.

[97] A J van Hessen,et al. Modeling phoneme perception. I: Categorical perception. , 1992, The Journal of the Acoustical Society of America.

[98] D. Klatt. Review of selected models of speech perception , 1989 .

[99] Steven Greenberg,et al. Computational Models of Auditory Function , 2001 .

[100] L. Rabiner,et al. Effects of smoothing and quantizing the parameters of formant-coded voiced speech. , 1971, The Journal of the Acoustical Society of America.

[101] M. Scheffers,et al. Discrimination of fundamental frequency of synthesized vowel sounds in a noise background. , 1984, The Journal of the Acoustical Society of America.

[102] Jae Lim,et al. Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise , 1978 .

[103] Li Deng,et al. Articulatory Features and Associated Production Models Statistical Speech Recognition , 1999 .

[104] Man Mohan Sondhi,et al. Techniques for estimating vocal-tract shapes from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[105] Hamid Sheikhzadeh,et al. Interval statistics generated from a cochlear model in response to speech sounds , 1994 .

[106] D. Massaro,et al. Evaluation and integration of acoustic features in speech perception. , 1980, The Journal of the Acoustical Society of America.

[107] Biing-Hwang Juang,et al. A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[108] J. Mariani,et al. Recent advances in speech processing , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[109] S. Hiki. Control Rule of the Tongue Movement for Dynamic Analog Speech Synthesis , 1970 .

[110] L. R. Rabiner,et al. An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[111] Li Deng,et al. Data-driven model construction for continuous speech recognition using overlapping articulatory features , 2000, INTERSPEECH.

[112] M. Sachs,et al. Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers. , 1979, The Journal of the Acoustical Society of America.

[113] K. Honda. Organization of tongue articulation for vowels , 1996 .

[114] A. B. Poritz,et al. Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[115] G. E. Peterson,et al. Control Methods Used in a Study of the Vowels , 1951 .

[116] Kuldip K. Paliwal,et al. Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[117] Hamid Sheikhzadeh,et al. A layered neural network interfaced with a cochlear model for the study of speech encoding in the auditory system , 1999, Comput. Speech Lang..

[118] W. S. Rhode,et al. A composite model of the auditory periphery for the processing of speech based on the filter response functions of single auditory-nerve fibers. , 1991, The Journal of the Acoustical Society of America.

[119] E. Zwicker. Dependence of post-masking on masker duration and its relation to temporal effects in loudness. , 1984, The Journal of the Acoustical Society of America.

[120] Richard S. McGowan,et al. Recovering articulatory movement from formant frequency trajectories using task dynamics and a genetic algorithm: Preliminary model tests , 1994, Speech Commun..

[121] Mari Ostendorf,et al. From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[122] B Hagerman,et al. Clinical measurements of speech reception threshold in noise. , 1984, Scandinavian audiology.

[123] Dick R. van Bergem,et al. A model of coarticulatory effects on the schwa , 1994, Speech Commun..

[124] S. A. Shamma. The Auditory Processing of Speech. , 1986 .

[125] A K Nábĕlek,et al. Perception of nonlinear and linear formant trajectories. , 1997, The Journal of the Acoustical Society of America.

[126] Raymond D. Kent,et al. Acoustic features of infant vocalic utterances at 3, 6, and 9 months. , 1982, The Journal of the Acoustical Society of America.

[127] Dennis H. Klatt,et al. Software for a cascade/parallel formant synthesizer , 1980 .

[128] Kuldip K. Paliwal,et al. A comparative performance evaluation of pitch estimation methods for TDHS/sub-band coding of speech , 1984, Speech Commun..

[129] Li Deng,et al. A robust compensation strategy for extraneous acoustic variations in spontaneous speech recognition , 2002, IEEE Trans. Speech Audio Process..

[130] R. S. McGowan,et al. Acoustic 1996: Speech production parameters for automatic speech recognition , 1997 .

[131] Shihab Shamma,et al. Auditory Representations of Timbre and Pitch , 1996 .

[132] P Howell,et al. Production and perception of rise time in the voiceless affricate/fricative distinction. , 1983, The Journal of the Acoustical Society of America.

[133] Lawrence R. Rabiner,et al. Speech synthesis by rule: An acoustic domain approach , 1968 .

[134] Louis A. Liporace,et al. Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[135] S. Young. Large Vocabulary Continuous Speech Recognition : a ReviewSteve , 1996 .

[136] R. Patterson,et al. Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[137] Oded Ghitza. Auditory models and human performance in tasks related to speech coding and speech recognition , 1994 .

[138] S. Wood. A radiographic analysis of constriction locations for vowels , 1979 .

[139] Eric D. Young,et al. Response properties of type II and type III units in dorsal cochlear nucleus , 1982, Hearing Research.

[140] R. L. Smith,et al. Adaptation in auditory-nerve fibers: A revised model , 1982, Biological Cybernetics.

[141] S.E. Levinson,et al. Structural methods in automatic speech recognition , 1985, Proceedings of the IEEE.

[142] H M Hanson,et al. Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[143] D B Pisoni,et al. Segmental intelligibility of synthetic speech produced by rule. , 1989, The Journal of the Acoustical Society of America.

[144] Dj Dik Hermes. Timing of pitch movements and accentuation of syllables in Dutch , 1997 .

[145] Kuansan Wang,et al. Self-normalization and noise-robustness in early auditory representations , 1994, IEEE Trans. Speech Audio Process..

[146] E. Young,et al. Responses to tones and noise of single cells in dorsal cochlear nucleus of unanesthetized cats. , 1976, Journal of neurophysiology.

[147] P. Woodland,et al. A computational model of the auditory periphery for speech and hearing research. II. Descending paths. , 1994, The Journal of the Acoustical Society of America.

[148] M M Sondhi,et al. The potential role of speech production models in automatic speech recognition. , 1996, The Journal of the Acoustical Society of America.

[149] E D Young,et al. Organization of dorsal cochlear nucleus type IV unit response maps and their relationship to activation by bandlimited noise. , 1991, Journal of neurophysiology.

[150] Thomas P. Barnwell,et al. Objective measures for speech quality testing , 1978 .

[151] M. Swamy,et al. High resolution formant extraction from linear-prediction phase spectra , 1984 .

[152] J. L. Hall,et al. Model for mechanical to neural transduction in the auditory receptor. , 1974, The Journal of the Acoustical Society of America.

[153] D. Whalen. The Motor Theory of Speech Perception , 2019, Oxford Research Encyclopedia of Linguistics.

[154] Li Deng,et al. Production models as a structural basis for automatic speech recognition , 1997, Speech Commun..

[155] J. R. Resnick,et al. The inverse problem for the vocal tract: numerical methods, acoustical experiments, and speech synthesis. , 1983, The Journal of the Acoustical Society of America.

[156] Thomas Baer,et al. An articulatory synthesizer for perceptual research , 1978 .

[157] Chin-W. Kim,et al. Models of Speech Production , 1972, Formal Aspects of Cognitive Processes.

[158] W. S. Rhode,et al. Physiological studies on neurons in the dorsal cochlear nucleus of cat. , 1986, Journal of neurophysiology.

[159] Yannis Stylianou. Removing linear phase mismatches in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[160] P Ladefoged,et al. Individual differences in vowel production. , 1993, The Journal of the Acoustical Society of America.

[161] Alan V. Oppenheim,et al. All-pole modeling of degraded speech , 1978 .

[162] S. Katagiri,et al. Discriminative Learning for Minimum Error Classification , 2009 .

[163] Xuemin Shen,et al. Maximum likelihood in statistical estimation of dynamic systems: Decomposition algorithm and simulation results , 1997, Signal Process..

[164] Andrej Ljolje,et al. Automatic segmentation of speech for TTS , 1993, EUROSPEECH.

[165] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[166] S. Shamma. Speech processing in the auditory system. II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve. , 1985, The Journal of the Acoustical Society of America.

[167] Shihab A. Shamma. Spatial and temporal processing in central auditory networks , 1989 .

[168] Richard M. Stern,et al. A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[169] V C Tartter,et al. Hearing smiles and frowns in normal and whisper registers. , 1994, The Journal of the Acoustical Society of America.

[170] Allyn E. Hubbard,et al. Analysis and Synthesis of Cochlear Mechanical Function Using Models , 1996 .

[171] M. Sachs,et al. Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. , 1979, The Journal of the Acoustical Society of America.

[172] Hamid Sheikhzadeh,et al. Speech analysis and recognition using interval statistics generated from a composite auditory model , 1998, IEEE Trans. Speech Audio Process..

[173] Hynek Hermansky,et al. Should recognizers have ears? , 1998, Speech Commun..

[174] Boaz Porat,et al. A course in digital signal processing , 1996 .

[175] Simon King,et al. Speech recognition via phonetically featured syllables , 1998, ICSLP.

[176] S. Neely. Finite difference solution of a two-dimensional mathematical model of the cochlea. , 1981, The Journal of the Acoustical Society of America.

[177] Wolfgang Hess,et al. Pitch Determination of Speech Signals , 1983 .

[178] R. N. Ohde,et al. Physiologic, Acoustic, and Perceptual Aspects of Coarticulation: Implications for the Remediation of Articulatory Disorders , 1981 .

[179] Kenneth N. Stevens,et al. On the quantal nature of speech , 1972 .

[180] C D Geisler,et al. Responses of auditory-nerve fibers to consonant-vowel syllables. , 1981, The Journal of the Acoustical Society of America.

[181] Tatsuya Hirahara,et al. Auditory front end in DTW word recognition under noisy, reverberant, and multispeaker conditions. , 1991 .

[182] E Paulus,et al. Automatic speech recognition using psychoacoustic models. , 1979, The Journal of the Acoustical Society of America.

[183] G. E. Peterson,et al. Duration of Syllable Nuclei in English , 1960 .

[184] L Deng,et al. Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics. , 2000, The Journal of the Acoustical Society of America.

[185] M. Paez,et al. Minimum Mean-Squared-Error Quantization in Speech PCM and DPCM Systems , 1972, IEEE Trans. Commun..

[186] D. Recasens,et al. A model of lingual coarticulation based on articulatory constraints , 1997 .

[187] S E Blumstein,et al. Further evidence of acoustic invariance in speech production: the stop-glide contrast. , 1983, The Journal of the Acoustical Society of America.

[188] Nam-Soo Kim. Nonstationary environment compensation based on sequential estimation , 1998 .

[189] D Kewley-Port,et al. Modeling formant frequency discrimination of female vowels. , 1996, The Journal of the Acoustical Society of America.

[190] V.W. Zue,et al. The use of speech knowledge in automatic speech recognition , 1985, Proceedings of the IEEE.

[191] Vladimir Pavlovic,et al. Dynamic bayesian networks for information fusion with applications to human-computer interfaces , 1999 .

[192] A. Huggins,et al. Just noticeable differences for segment duration in natural speech. , 1969, The Journal of the Acoustical Society of America.

[193] Li Deng,et al. A Bayesian approach to speech feature enhancement using the dynamic cepstral prior , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[194] W. Klein,et al. Vowel spectra, vowel spaces, and vowel identification. , 1970, The Journal of the Acoustical Society of America.

[195] C. L. Searle,et al. Time-domain analysis of auditory-nerve-fiber firing rates. , 1990, The Journal of the Acoustical Society of America.

[196] D.R. Reddy,et al. Speech recognition by machine: A review , 1976, Proceedings of the IEEE.

[197] R I Damper,et al. A computational model of afferent neural activity from the cochlea to the dorsal acoustic stria. , 1991, The Journal of the Acoustical Society of America.

[198] H. Voigt,et al. Evidence of inhibitory interactions between neurons in dorsal cochlear nucleus. , 1980, Journal of neurophysiology.

[199] M M Sondhi. Resonances of a bent vocal tract. , 1986, The Journal of the Acoustical Society of America.

[200] M. Sachs,et al. Representation of stop consonants in the discharge patterns of auditory-nerve fibers. , 1983, The Journal of the Acoustical Society of America.

[201] Ray Meddis,et al. Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[202] Richard F. Lyon,et al. On the importance of time—a temporal representation of sound , 1993 .

[203] H. Voigt,et al. Cross-correlation analysis of inhibitory interactions in dorsal cochlear nucleus. , 1990, Journal of neurophysiology.

[204] Singiresu S. Rao,et al. Optimization Theory and Applications , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[205] Darragh O'Brien,et al. Concatenative synthesis based on a harmonic model , 2001, IEEE Trans. Speech Audio Process..

[206] W. S. Rhode,et al. The use of intracellular techniques in the study of the cochlear nucleus. , 1985, The Journal of the Acoustical Society of America.

[207] Aaron E. Rosenberg,et al. A subjective evaluation of pitch detection methods using LPC synthesized speech , 1977 .

[208] E. F. Evans,et al. The Dynamic Range Problem: Place and Time Coding at the Level of Cochlear Nerve and Nucleus , 1981 .

[209] Li Deng,et al. Speech trajectory discrimination using the minimum classification error learning , 1998, IEEE Trans. Speech Audio Process..

[210] Hamid Sheikhzadeh,et al. Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization , 1994, IEEE Trans. Speech Audio Process..

[211] S. Haykin,et al. Adaptive Filter Theory , 1986 .

[212] I. Pollack,et al. Intelligibility of Excerpts from Conversation , 1963 .

[213] Xuedong Huang,et al. Semi-continuous hidden Markov models for speech signals , 1990 .

[214] J. L. Miller,et al. A distinction between the effects of sentential speaking rate and semantic congruity on word identification , 1984, Perception & psychophysics.

[215] M. Haggard,et al. Pitch as a voicing cue. , 1970, The Journal of the Acoustical Society of America.

[216] S. N. Jagannathan. Handbook of Sensory Physiology: Auditory System , 1978 .

[217] L. Streeter,et al. Acoustic and perceptual indicators of emotional stress. , 1983, The Journal of the Acoustical Society of America.

[218] Patricia A. Keating,et al. Papers in Laboratory Phonology: The window model of coarticulation: articulatory evidence , 1990 .

[219] Partha Niyogi. Modelling Speaker Variability and Imposing Speaker Constraints in Phonetic Classification , 1992 .

[220] Israel Nelken,et al. Nonlinearity of spectra processing in the dorsal cochlear nucleus (DCN) , 1993 .

[221] Li Deng,et al. A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics , 1999, EUROSPEECH.

[222] Laurel H. Carney,et al. Evaluating Auditory Performance Limits: I. One-Parameter Discrimination Using a Computational Model for the Auditory Nerve , 2001, Neural Computation.

[223] R. Shumway,et al. Dynamic linear models with switching , 1991 .

[224] R. Smits,et al. Evaluation of various sets of acoustic cues for the perception of prevocalic stop consonants. II. Modeling and evaluation. , 1996, The Journal of the Acoustical Society of America.

[225] Katrin Kirchhoff. Syllable-level desynchronisation of phonetic features for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[226] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[227] Patrick Kenny,et al. A linear predictive HMM for vector-valued observations with applications to speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[228] Herbert Voigt,et al. The Internal Organization of the Dorsal Cochlear Nucleus , 1981 .

[229] A. Poritz,et al. Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[230] D. O'Shaughnessy,et al. Linguistic modality effects on fundamental frequency in speech. , 1983, The Journal of the Acoustical Society of America.

[231] L A Chistovich,et al. Auditory Processing of Speech , 1980, Language and speech.

[232] Biing-Hwang Juang,et al. Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.

[233] Richard E. Pastore,et al. Temporal order identification: Some parameter dependencies , 1982 .

[234] Robert C. Moore. Using Natural-Language Knowledge Sources in Speech Recognition , 1999 .

[235] M E Schouten,et al. The case against a speech mode of perception. , 1980, Acta psychologica.

[236] H. Lane,et al. Speech deterioration in postlingually deafened adults. , 1991, The Journal of the Acoustical Society of America.

[237] R L Diehl,et al. Identifying vowels in CVC syllables: effects of inserting silence and noise. , 1981, Perception & psychophysics.

[238] Richard Sproat,et al. Multilingual Text-to-Speech Synthesis: The Bell Labs Approach , 1998, CL.

[239] R. Meddis,et al. A computer model of amplitude-modulation sensitivity of single units in the inferior colliculus. , 1994, The Journal of the Acoustical Society of America.

[240] Hermann Ney,et al. Progress in dynamic programming search for LVCSR , 2000 .

[241] S. Greenberg. Representation of Speech in the Auditory Periphery , 1988 .

[242] I. Lehiste,et al. Role of duration in disambiguating syntactically ambiguous sentences , 1975 .

[243] Kim E. A. Silverman,et al. Evidence for the independent function of intonation contour type, voice quality, and F0 range in signaling speaker affect , 1985 .

[244] Masaaki Honda,et al. A dynamical articulatory model using potential task representation , 1994, ICSLP.

[245] R. N. Ohde,et al. Effect of relative amplitude of frication on perception of place of articulation. , 1991, The Journal of the Acoustical Society of America.

[246] J. Perkell,et al. Invariance and variability in speech processes , 1987 .

[247] A Robert,et al. A composite model of the auditory periphery for simulating responses to complex sounds. , 1999, The Journal of the Acoustical Society of America.

[248] Hsiao-Wuen Hon,et al. Unified frame and segment based models for automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[249] A M Liberman,et al. Perception of the speech code. , 1967, Psychological review.

[250] John S. Bridle,et al. The HDM: a segmental hidden dynamic model of coarticulation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[251] Li Deng,et al. Optimization of dynamic regimes in a statistical hidden dynamic model for conversational speech recognition , 1999, EUROSPEECH.

[252] S. Haykin,et al. Pattern Recognition Using a Family of Design Algorithms Based upon the Generalized Probabilistic Descent Method , 2001 .

[253] Aaron E. Rosenberg,et al. A comparative performance study of several pitch detection algorithms , 1976 .

[254] Elliot Saltzman,et al. The dynamical perspectives on speech production: Data and theory , 1986 .

[255] William L. Henke,et al. Dynamic articulatory model of speech production using computer simulation , 1966 .

[256] K. Stevens,et al. Feature geometry and the vocal tract , 1994, Phonology.

[257] Chin-Hui Lee,et al. Bayesian Adaptive Learning and Map Estimation of HMM , 1996 .

[258] Li Deng,et al. Nonstationary-state hidden Markov model representation of speech signals for speech enhancement , 2002, Signal Process..

[259] Li Deng,et al. Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions , 1997, IEEE Trans. Speech Audio Process..

[260] Steven Kay,et al. Fundamentals Of Statistical Signal Processing , 2001 .

[261] David B. Pisoni,et al. Text-to-speech: the mitalk system , 1987 .

[262] R H Wilson,et al. Word recognition with segmented-alternated CVC words: a preliminary report on listeners with normal hearing. , 1984, Journal of speech and hearing research.

[263] M.R. Schroeder,et al. Models of hearing , 1975, Proceedings of the IEEE.

[264] L. R. Rabiner,et al. On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[265] Helen Meng,et al. The Use of Distinctive Features for Automatic Speech Recognition , 1991 .

[266] Chilin Shih,et al. Bell laboratories Russian text-to-speech system , 1997, EUROSPEECH.

[267] Li Deng,et al. High-performance robust speech recognition using stereo training data , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[268] L D Braida,et al. Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing. , 1994, The Journal of the Acoustical Society of America.

[269] Ronald Rosenfeld,et al. A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[270] D. W. Thomas. Linear Prediction of Speech, J.D. Markel, A.H. Gray. Springer-Verlag, Berlin, Heidelberg, New York (1976), xii+288 pp. Cloth; price DM 73,00; U.S. $30.00, ISBN: 3-540-07563-1 , 1977 .

[271] Mari Ostendorf,et al. A dynamical system model for generating fundamental frequency for speech synthesis , 1999, IEEE Trans. Speech Audio Process..

[272] Martin Russell,et al. A segmental HMM for speech pattern modelling , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[273] Saito,et al. Fundamentals of Speech Signal Processing , 1986 .

[274] Robert M. Gray,et al. Toeplitz and Circulant Matrices: A Review , 2005, Found. Trends Commun. Inf. Theory.

[275] Dennis H. Klatt,et al. Perception of Segment Duration in Sentence Contexts , 1975 .

[276] K. Payton. Vowel processing by a model of the auditory periphery: A comparison to eighth‐nerve responses , 1988 .

[277] Katrin Kirchhoff,et al. Robust speech recognition using articulatory information , 1998 .

[278] B. Lindblom,et al. Role of articulation in speech perception: clues from production. , 1996, The Journal of the Acoustical Society of America.

[279] Steven Greenberg,et al. Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[280] P. Mermelstein. Articulatory model for the study of speech production. , 1973, The Journal of the Acoustical Society of America.

[281] D O Kim,et al. Spatial response profiles of posteroventral cochlear nucleus neurons and auditory-nerve fibers in unanesthetized decerebrate cats: response to pure tones. , 1991, The Journal of the Acoustical Society of America.

[282] R. B. Monsen,et al. The accuracy of formant frequency measurements: a comparison of spectrographic analysis and linear prediction. , 1983, Journal of speech and hearing research.

[283] Alex Bateman,et al. An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[284] Li Deng,et al. Integrated-multilingual speech recognition using universal phonological features in a functional speech production model , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[285] Li Deng,et al. A Markov model containing state-conditioned second-order non-stationarity: application to speech recognition , 1995, Comput. Speech Lang..

[286] Alex Acero,et al. Automatic generation of synthesis units for trainable text-to-speech systems , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[287] Yannis Stylianou,et al. TD-PSOLA versus harmonic plus noise model in diphone based speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[288] Philip N. Garner,et al. Using formant frequencies in speech recognition , 1997, EUROSPEECH.

[289] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[290] J L Miller,et al. The influence of sentential speaking rate on the internal structure of phonetic categories. , 1994, The Journal of the Acoustical Society of America.

[291] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[292] Hamid Sheikhzadeh,et al. Comparative performance of spectral subtraction and HMM-based speech enhancement strategies with application to hearing and design , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[293] J Hillenbrand,et al. Perception of the voiced-voiceless contrast in syllable-final stops. , 1984, The Journal of the Acoustical Society of America.

[294] Enrico Mugnaini,et al. Neuronal Circuits in the Dorsal Cochlear Nucleus , 1981 .

[295] Li Deng,et al. Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[296] Hamid Sheikhzadeh,et al. HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[297] Hamid Sheikhzadeh,et al. Real-time speech synthesis on an ultra low-resource, programmable DSP system , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[298] Li Deng,et al. Computational Models for Speech Production , 2018, Speech Processing.

[299] H. Sussman,et al. An investigation of locus equations as a source of relational invariance for stop place categorization , 1991 .

[300] John E. Shore,et al. Discrete utterance speech recognition without time alignment , 1983, IEEE Trans. Inf. Theory.

[301] J J Jenkins,et al. Vowel identification in mixed-speaker silent-center syllables. , 1994, The Journal of the Acoustical Society of America.

[302] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[303] C. L. Searle,et al. Stop consonant discrimination based on human audition. , 1979, The Journal of the Acoustical Society of America.

[304] Li Deng,et al. A maximum a posteriori approach to speaker adaptation using the trended hidden Markov model , 2001, IEEE Trans. Speech Audio Process..

[305] A. Liberman,et al. Some Cues for the Distinction Between Voiced and Voiceless Stops in Initial Position , 1957 .

[306] Raymond D. Kent,et al. chapter 3 – Models of Speech Production , 1976 .

[307] Carl E. Rasmussen,et al. The Infinite Gaussian Mixture Model , 1999, NIPS.

[308] A.V. Oppenheim,et al. The importance of phase in signals , 1980, Proceedings of the IEEE.

[309] Martin J. Russell,et al. Modeling speech variability with segmental HMMs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[310] S. Soli,et al. Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. , 1994, The Journal of the Acoustical Society of America.

[311] D. Klatt,et al. Structure of a phonological rule component for a synthesis-by-rule program , 1976 .

[312] Paul Mermelstein,et al. Difference limens for formant frequencies of steady‐state and consonant‐bound vowels , 1976 .

[313] M. Sachs,et al. Rate-place and temporal-place representations of vowels in the auditory nerve and anteroventral cochlear nucleus , 1988 .

[314] J L Miller. Nonindependence of feature processing in initial consonants. , 1977, Journal of speech and hearing research.

[315] Li Deng,et al. A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition , 1998, Speech Commun..

[316] S. McCandless,et al. An algorithm for automatic formant extraction using linear prediction spectra , 1974 .

[317] Max A. Viergever,et al. Mechanics of the inner ear: A mathematical approach , 1980 .

[318] Laurel H. Carney,et al. Evaluating Auditory Performance Limits: II. One-Parameter Discrimination with Random-Level Variation , 2001, Neural Computation.

[319] Li Deng,et al. A Bayesian Approach to Speaker Verification , 2001 .

[320] K. Stevens,et al. Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[321] Li Deng,et al. HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features , 1997, IEEE Trans. Speech Audio Process..

[322] B. Moore,et al. Frequency and intensity difference limens for harmonics within complex tones. , 1984, The Journal of the Acoustical Society of America.

[323] Y Xu,et al. Production and perception of coarticulated tones. , 1994, The Journal of the Acoustical Society of America.

[324] J. 't Hart,et al. Discriminability of the size of pitch movements in speech , 1974 .

[325] D. Ladd,et al. Declination.: a review and some hypotheses , 1984, Phonology Yearbook.

[326] Li Deng,et al. Initial evaluation of hidden dynamic models on conversational speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[327] Mandy Eberhart,et al. Speech Communications Human And Machine , 2016 .

[328] Oded Ghitza,et al. Hidden Markov models with templates as non-stationary states: an application to speech recognition , 1993, Comput. Speech Lang..

[329] D. O'Shaughnessy. Consonant durations in clusters , 1974 .

[330] F Rattay,et al. The mammalian auditory hair cell: a simple electric circuit model. , 1998, The Journal of the Acoustical Society of America.

[331] Stefanie Shattuck-Hufnagel,et al. Implementation of a model for lexical access based on features , 1992, ICSLP.

[332] Chin-Hui Lee,et al. On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate , 1997, IEEE Trans. Speech Audio Process..

[333] Li Deng,et al. Transitional speech units and their representation by regressive Markov states: applications to speech recognition , 1996, IEEE Trans. Speech Audio Process..

[334] A. Liberman,et al. Tempo of frequency change as a cue for distinguishing classes of speech sounds. , 1956, Journal of experimental psychology.

[335] B. Lindblom,et al. Interaction between duration, context, and speaking style in English stressed vowels , 1994 .

[336] W. Brownell,et al. Synaptic organization of eighth nerve afferents to cat dorsal cochlear nucleus. , 1983, Journal of neurophysiology.

[337] Antonio M. Peinado,et al. Model-based compensation of the additive noise for continuous speech recognition. experiments using the Aurora II database and tasks , 2001, INTERSPEECH.

[338] A. Liberman,et al. The motor theory of speech perception revised , 1985, Cognition.

[339] M. Halle,et al. Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates , 1961 .

[340] John Hart,et al. A Perceptual Study of Intonation , 1990 .