Intelligibility Enhancement of Speech in Noise

Speech technology can facilitate human-machine interaction and create new communication interfaces. Text-To-Speech (TTS) systems provide speech output for dialogue, notification and reading applications as well as personalized voices for people that have lost the use of their own. TTS systems are built to produce synthetic voices that should sound as natural, expressive and intelligible as possible and if necessary be similar to a particular speaker. Although naturalness is an important requirement, providing the correct information in adverse conditions can be crucial to certain applications. Speech that adapts or reacts to different listening conditions can in turn be more expressive and natural. In this work we focus on enhancing the intelligibility of TTS voices in additive noise. For that we adopt the statistical parametric paradigm for TTS in the shape of a hidden Markov model (HMM-) based speech synthesis system that allows for flexible enhancement strategies. Little is known about which human speech production mechanisms actually increase intelligibility in noise and how the choice of mechanism relates to noise type, so we approached the problem from another perspective: using mathematical models for hearing speech in noise. To find which models are better at predicting intelligibility of TTS in noise we performed listening evaluations to collect subjective intelligibility scores which we then compared to the models’ predictions. In these evaluations we observed that modifications performed on the spectral envelope of speech can increase intelligibility significantly, particularly if the strength of the modification depends on the noise and its level. We used these findings to inform the decision of which of the models to use when automatically modifying the spectral envelope of the speech according to the noise. We devised two methods, both involving cepstral coefficient modifications. The first was applied during extraction while training the acoustic models and the other when generating a voice using pre-trained TTS models. The latter has the advantage of being able to address fluctuating noise. To increase intelligibility of synthetic speech at generation time we proposed a method for Mel cepstral coefficient modification based on the glimpse proportion measure, the most promising of the models of speech intelligibility that we evaluated. An extensive series of listening experiments demonstrated that this method brings significant intelligibility gains to TTS voices while not requiring additional recordings of clear or Lombard speech. To further improve intelligibility we combined our method with noise-independent enhancement approaches based on the acoustics of highly intelligible speech. This combined solution was as effective for stationary noise as for the challenging com-

[1]  Tomoki Toda,et al.  Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation , 2006, INTERSPEECH.

[2]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[3]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[4]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[5]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[7]  James S. Magnuson,et al.  The Dynamics of Lexical Competition During Spoken Word Recognition , 2007, Cogn. Sci..

[8]  Simon King,et al.  Improving intelligibility in noise of HMM-generated speech via noise-dependent and -independent methods , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Yannis Stylianou,et al.  Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression , 2012, INTERSPEECH.

[10]  Peter Vary,et al.  Near-End Listening Enhancement in the Presence of Bandpass Noises , 2012, ITG Conference on Speech Communication.

[11]  Simon King,et al.  Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise , 2012, SAPA@INTERSPEECH.

[12]  Torsten Dau,et al.  Prediction of speech intelligibility based on an auditory preprocessing model , 2010, Speech Commun..

[13]  Paavo Alku,et al.  Analysis of HMM-Based Lombard Speech Synthesis , 2011, INTERSPEECH.

[14]  G D Allen,et al.  Segmental intelligibility and speech interference thresholds of high-quality synthetic speech in presence of noise. , 1993, Journal of speech and hearing research.

[15]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[16]  Heiga Zen,et al.  Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..

[17]  Junichi Yamagishi,et al.  Glottal spectral separation for parametric speech synthesis , 2008, INTERSPEECH.

[18]  James M. McQueen,et al.  Eight questions about spoken-word recognition , 2007 .

[19]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[20]  Martin Cooke,et al.  Speech production modifications produced in the presence of low-pass and high-pass filtered noise. , 2009, The Journal of the Acoustical Society of America.

[21]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[22]  G. Fairbanks Test of Phonemic Differentiation: The Rhyme Test , 1958 .

[23]  Allen Gersho,et al.  Adaptive postfiltering for quality enhancement of coded speech , 1995, IEEE Trans. Speech Audio Process..

[24]  Simon King,et al.  Using neighbourhood density and selective SNR boosting to increase the intelligibility of synthetic speech in noise , 2013, SSW.

[25]  Paavo Alku,et al.  HMM-based Finnish text-to-speech system utilizing glottal inverse filtering , 2008, INTERSPEECH.

[26]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[27]  S. King,et al.  The Blizzard Challenge 2010 , 2010 .

[28]  John H. L. Hansen,et al.  Classification of speech under stress using target driven features , 1996, Speech Commun..

[29]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[30]  Paavo Alku,et al.  On measuring the intelligibility of synthetic speech in noise — Do we need a realistic noise environment? , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Taniya Mishra,et al.  On the intelligibility of fast synthesized speech for individuals with early-onset blindness , 2011, ASSETS.

[32]  Christian Benoît,et al.  An intelligibility test using semantically unpredictable sentences: towards the quantification of linguistic complexity , 1990, Speech Commun..

[33]  Paavo Alku,et al.  Measuring the effect of fundamental frequency raising as a strategy for increasing vocal intensity in soft, normal and loud phonation , 2002, Speech Commun..

[34]  R. Niederjohn,et al.  The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression , 1976 .

[35]  Martin Cooke,et al.  Conversing in the Presence of a Competing Conversation: Effects on Speech Production , 2011, INTERSPEECH.

[36]  Murray Campbell,et al.  Proceedings of the Institute of Acoustics , 2013 .

[37]  Martin Cooke,et al.  Spectral and temporal changes to speech produced in the presence of energetic and informational maskers. , 2010, The Journal of the Acoustical Society of America.

[38]  K. S. Rhebergen,et al.  Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. , 2006, The Journal of the Acoustical Society of America.

[39]  Simon King,et al.  Evaluation of objective measures for intelligibility prediction of HMM-based synthetic speech in noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  B. Moore,et al.  A revision of Zwicker's loudness model , 1996 .

[41]  R. Kubichek,et al.  Advances in objective voice quality assessment , 1991, IEEE Global Telecommunications Conference GLOBECOM '91: Countdown to the New Millennium. Conference Record.

[42]  Yoshinori Sagisaka,et al.  ATR μ-talk speech synthesis system , 1992, ICSLP.

[43]  R. Plomp,et al.  Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing. , 1990, The Journal of the Acoustical Society of America.

[44]  Susanto Rahardja,et al.  Lombard effect mimicking , 2010, SSW.

[45]  J. L. Hall,et al.  Intelligibility and listener preference of telephone speech in the presence of babble noise. , 2010, The Journal of the Acoustical Society of America.

[46]  Maria Klara Wolters,et al.  Evaluating speech synthesis intelligibility using Amazon Mechanical Turk , 2010, SSW.

[47]  S. King,et al.  Improving Instrumental Quality Prediction Performance for the Blizzard Challenge , 2008 .

[48]  R. Patel,et al.  The influence of linguistic content on the Lombard effect. , 2008, Journal of speech, language, and hearing research : JSLHR.

[49]  Milos Cernak Unit Selection Speech Synthesis in Noise , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[50]  Rupal Patel,et al.  Loudmouth:: modifying text-to-speech synthesis in noise , 2006, Assets '06.

[51]  Satoshi Imai,et al.  Cepstral analysis synthesis on the mel frequency scale , 1983, ICASSP.

[52]  J Reichle,et al.  The intelligibility of synthesized speech: ECHO II versus VOTRAX. , 1987, Journal of speech and hearing research.

[53]  W. Dreschler,et al.  ICRA noises: artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment. International Collegium for Rehabilitative Audiology. , 2001, Audiology : official organ of the International Society of Audiology.

[54]  Simon King,et al.  Can Objective Measures Predict the Intelligibility of Modified HMM-Based Synthetic Speech in Noise? , 2011, INTERSPEECH.

[55]  Yi Hu,et al.  A comparative intelligibility study of single-microphone noise reduction algorithms. , 2007, The Journal of the Acoustical Society of America.

[56]  Philipos C. Loizou,et al.  SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech , 2011, Speech Commun..

[57]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[58]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[59]  Heiga Zen,et al.  Cepstral analysis based on the glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[60]  Martin Cooke,et al.  The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise , 2009, Speech Commun..

[61]  Simon King,et al.  Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion , 2014, Comput. Speech Lang..

[62]  Maria Uther,et al.  Do you speak E-NG-L-I-SH? A comparison of foreigner- and infant-directed speech , 2007, Speech Commun..

[63]  Junichi Yamagishi,et al.  Towards an improved modeling of the glottal source in statistical parametric speech synthesis , 2007, SSW.

[64]  Keiichi Tokuda,et al.  Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis , 2008, INTERSPEECH.

[65]  Takao Kobayashi,et al.  Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[66]  Ren-Hua Wang,et al.  Minimum Generation Error Training for HMM-Based Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[67]  Junichi Yamagishi,et al.  HMM-BASED EXPRESSIVE SPEECH SYNTHESIS — TOWARDS TTS WITH ARBITRARY SPEAKING STYLES AND EMOTIONS , 2003 .

[68]  David B. Pisoni,et al.  Perception and Comprehension of Synthetic Speech 1 , 2004 .

[69]  Roger K. Moore,et al.  C2H: A Computational Model of H&H-based Phonetic Contrast in Synthetic Speech , 2012, INTERSPEECH.

[70]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[71]  Heiga Zen,et al.  Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..

[72]  R. H. Bernacki,et al.  Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.

[73]  Junichi Yamagishi,et al.  Average-Voice-Based Speech Synthesis , 2006 .

[74]  Heiga Zen,et al.  An excitation model for HMM-based speech synthesis based on residual modeling , 2007, SSW.

[75]  Simon King,et al.  Robustness of HMM-based speech synthesis , 2008, INTERSPEECH.

[76]  Paavo Alku,et al.  HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[77]  Simon King,et al.  Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise , 2013, INTERSPEECH.

[78]  Ian McLoughlin,et al.  LSP-based speech modification for intelligibility enhancement , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[79]  Richard Heusdens,et al.  A speech preprocessing strategy for intelligibility improvement in noise based on a perceptual distortion measure , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[80]  Mark J. F. Gales,et al.  Complex cepstrum for statistical parametric speech synthesis , 2013, Speech Commun..

[81]  Li-Rong Dai,et al.  Minimum Kullback–Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[82]  Peter Howell,et al.  Strength of British English accents in altered listening conditions , 2006, Perception & psychophysics.

[83]  Junichi Yamagishi,et al.  Combining vocal tract length normalization with hierarchial linear transformations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[84]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[85]  Alan W. Black,et al.  Improving the understandability of speech synthesis by modeling speech in noise , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[86]  Philipos C. Loizou,et al.  Evaluation of objective measures for quality assessment of reverberant speech , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[87]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[88]  Thierry Dutoit,et al.  Analysis and HMM-based synthesis of hypo and hyperarticulated speech , 2014, Comput. Speech Lang..

[89]  A. Gray,et al.  Distance measures for speech processing , 1976 .

[90]  Diego H. Milone,et al.  Objective quality evaluation in blind source separation for speech recognition in a real room , 2007, Signal Process..

[91]  Inma Hernáez,et al.  Implementation of Simple Spectral Techniques to Enhance the Intelligibility of Speech using a Harmonic Model , 2012, INTERSPEECH.

[92]  Ruth Y Litovsky,et al.  The benefit of binaural hearing in a cocktail party: effect of location and type of interferer. , 2004, The Journal of the Acoustical Society of America.

[93]  Diego H. Milone,et al.  Perceptual evaluation of blind source separation for robust speech recognition , 2008, Signal Process..

[94]  Simon King,et al.  Using an intelligibility measure to create noise robust cepstral coefficients for HMM-based speech synthesis , 2012 .

[95]  Simon King,et al.  Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise , 2012, INTERSPEECH.

[96]  Keiichi Tokuda,et al.  Duration modeling for HMM-based speech synthesis , 1998, ICSLP.

[97]  Ronald E. Crochiere,et al.  A study of complexity and quality of speech waveform coders , 1978, ICASSP.

[98]  F. Itakura,et al.  A statistical method for estimation of speech spectral density and formant frequencies , 1970 .

[99]  Paavo Alku,et al.  The GlottHMM Speech Synthesis Entry for Blizzard Challenge 2010 , 2010 .

[100]  Sebastian Möller,et al.  Quality prediction for synthesized speech: Comparison of approaches , 2009 .

[101]  D. Pisoni,et al.  Speech Synthesis, Perception and Comprehension of , 2006 .

[102]  Hideki Kawahara,et al.  Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT , 2001, MAVEBA.

[103]  Dennis H. Klatt,et al.  Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.

[104]  Heiga Zen,et al.  The HTS-2008 System: Yet Another Evaluation of the Speaker-Adaptive HMM-based Speech Synthesis System in The 2008 Blizzard Challenge , 2008 .

[105]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[106]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[107]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[108]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[109]  小石田和人 Low Bit Rate Speech Coding Based on Mel-Generalized Cepstral Analysis(メル一般化ケプストラム分析に基づく低ビットレート音声符号化) , 1998 .

[110]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[111]  Martin Cooke,et al.  Speech production modifications produced by competing talkers, babble, and stationary noise. , 2008, The Journal of the Acoustical Society of America.

[112]  A. Bonafonte,et al.  FLEXIBLE HARMONIC / STOCHASTIC MODELING FOR HMM-BASED SPEECH SYNTHESIS , 2008 .

[113]  Peter Vary,et al.  NEAR END LISTENING ENHANCEMENT CONSIDERING THERMAL LIMIT OF MOBILE PHONE LOUDSPEAKERS , 2011 .

[114]  Thierry Dutoit,et al.  Glottal-based analysis of the lombard effect , 2010, INTERSPEECH.

[115]  Keiichi Tokuda,et al.  The blizzard challenge - 2005: evaluating corpus-based speech synthesis on common datasets , 2005, INTERSPEECH.

[116]  Yan Tang,et al.  Subjective and Objective Evaluation of Speech Intelligibility Enhancement Under Constant Energy and Duration Constraints , 2011, INTERSPEECH.

[117]  Yan Tang,et al.  Energy reallocation strategies for speech enhancement in known noise conditions , 2010, INTERSPEECH.

[118]  Marion Dohen,et al.  An acoustic and articulatory study of Lombard speech: global effects on the utterance , 2006, INTERSPEECH.

[119]  Yi Hu,et al.  A Comparative Intelligibility Study of Speech Enhancement Algorithms , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[120]  Keiichi Tokuda,et al.  Minimum generation error training by using original spectrum as reference for log spectral distortion measure , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[121]  Kuldip K. Paliwal,et al.  Objective Intelligibility Prediction of Speech by Combining Correlation and Distortion Based Techniques , 2011, INTERSPEECH.

[122]  D B Pisoni,et al.  Segmental intelligibility of synthetic speech produced by rule. , 1989, The Journal of the Acoustical Society of America.

[123]  Keiichi Tokuda,et al.  Adaptive cepstral analysis of speech , 1995, IEEE Trans. Speech Audio Process..

[124]  Coralie Hemptinne Master Thesis: Integration of the Harmonic plus Noise Model (HNM) into the Hidden Markov Model-Based Speech Synthesis System (HTS) , 2006 .

[125]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[126]  Tiago H. Falk,et al.  Quantifying perturbations in temporal dynamics for automated assessment of spastic dysarthric speech intelligibility , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[127]  Martin Cooke,et al.  Glimpsing speech , 2003, J. Phonetics.

[128]  Paavo Alku,et al.  An adaptive post-filtering method producing an artificial Lombard-like effect for intelligibility enhancement of narrowband telephone speech , 2014, Comput. Speech Lang..

[129]  Yan Tang,et al.  Optimised spectral weightings for noise-dependent speech intelligibility enhancement , 2012, INTERSPEECH.

[130]  Yao Yao The Effects of Phonological Neighborhoods on Pronunciation Variation in Conversational Speech. , 2011 .

[131]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[132]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[133]  Keith Johnson,et al.  Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech , 2012 .

[134]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[135]  Francisco Casacuberta,et al.  An analysis of general acoustic-phonetic features for Spanish speech produced with the Lombard effect , 1996, Speech Commun..

[136]  Hermann Ney,et al.  Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[137]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[138]  T. Dau,et al.  A computational model of human auditory signal processing and perception. , 2008, The Journal of the Acoustical Society of America.

[139]  Thierry Dutoit,et al.  Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[140]  W. Bastiaan Kleijn,et al.  Generalized Postfilter for Speech Quality Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[141]  Sebastian Möller,et al.  Towards Signal-Based Instrumental Quality Diagnosis for Text-to-Speech Systems , 2008, IEEE Signal Processing Letters.

[142]  Jeesun Kim,et al.  The Effect of Seeing the Interlocutor on Speech Production in Different Noise Types , 2011, INTERSPEECH.

[143]  A. Macleod,et al.  Quantifying the contribution of vision to speech perception in noise. , 1987, British journal of audiology.

[144]  Keiichi Tokuda,et al.  Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.

[145]  T. Barnwell Correlation analysis of subjective and objective measures for speech quality , 1980, ICASSP.

[146]  Yannis Stylianou,et al.  Unsupervised Acoustic Analyses of Normal and Lombard Speech, with Spectral Envelope Transformation to Improve Intelligibility , 2012, INTERSPEECH.

[147]  D. Pisoni,et al.  Recognizing Spoken Words: The Neighborhood Activation Model , 1998, Ear and hearing.

[148]  John G. Harris,et al.  Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments , 2006, Speech Commun..

[149]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[150]  Björn Lindblom,et al.  Explaining Phonetic Variation: A Sketch of the H&H Theory , 1990 .

[151]  Hideki Kawahara,et al.  Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[152]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[153]  Gustav Eje Henter,et al.  Enhancing Subjective Speech Intelligibility Using a Statistical Model of Speech , 2012, INTERSPEECH.

[154]  F. Itakura Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[155]  Jesper Jensen,et al.  An evaluation of objective quality measures for speech intelligibility prediction , 2009, INTERSPEECH.

[156]  Martin Cooke,et al.  Information-preserving temporal reallocation of speech in the presence of fluctuating maskers , 2013, INTERSPEECH.

[157]  Lou Boves,et al.  Predicting word correct rate from acoustic and linguistic confusability , 2004, INTERSPEECH.

[158]  V. Hazan,et al.  Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions. , 2011, The Journal of the Acoustical Society of America.

[159]  J. Flege,et al.  Effects of experience on non-native speakers' production and perception of English vowels , 1997 .

[160]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[161]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[162]  Keiichi Tokuda,et al.  Mixed excitation for HMM-based speech synthesis , 2001, INTERSPEECH.

[163]  D.B. Pisoni,et al.  Perception of synthetic speech generated by rule , 1985, Proceedings of the IEEE.

[164]  Mariapaola D'Imperio,et al.  Lexical and contextual predictability: Confluent effects on the production of vowels , 2010 .

[165]  Susan Fitt,et al.  On generating combilex pronunciations via morphological analysis , 2010, INTERSPEECH.

[166]  N I Durlach,et al.  Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. , 1985, Journal of speech and hearing research.

[167]  Keiichi Tokuda,et al.  Spectral representation of speech based on mel‐generalized cepstral coefficients and its properties , 2000 .

[168]  Yi Hu,et al.  Evaluation of objective measures for speech enhancement , 2006, INTERSPEECH.

[169]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[170]  Thierry Dutoit,et al.  Continuous Control of the Degree of Articulation in HMM-Based Speech Synthesis , 2011, INTERSPEECH.

[171]  Keiichi Tokuda,et al.  Multi-Space Probability Distribution HMM , 2002 .

[172]  Koichi Shinoda,et al.  MDL-based context-dependent subword modeling for speech recognition , 2000 .

[173]  Yannis Stylianou,et al.  Evaluating the intelligibility benefit of speech modifications in known noise conditions , 2013, Speech Commun..

[174]  Heiga Zen,et al.  The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[175]  H. Timothy Bunnell,et al.  Text-To-Speech Intelligibility Across Speech Rates , 2012, INTERSPEECH.

[176]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[177]  Simon King,et al.  Analysis of speaker clustering strategies for HMM-based speech synthesis , 2012, INTERSPEECH.

[178]  B. Blesser,et al.  Audio dynamic range compression for minimum perceived distortion , 1969 .

[179]  Alexander L. Francis,et al.  The Effect of Lexical Complexity on Intelligibility , 1999, Int. J. Speech Technol..

[180]  Ren-Hua Wang,et al.  USTC System for Blizzard Challenge 2006 an Improved HMM-based Speech Synthesis Method , 2006, Blizzard Challenge.

[181]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[182]  Peter Vary,et al.  Recursive Closed-Form Optimization of Spectral Audio Power Allocation for Near End Listening Enhancement , 2010, Sprachkommunikation.

[183]  Hui Liang,et al.  VTLN adaptation for statistical speech synthesis , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[184]  Ching-Chung Li,et al.  Speech signal modification to increase intelligibility in noisy environments. , 2007, The Journal of the Acoustical Society of America.

[185]  Keiichi Tokuda,et al.  CELP coding system based on mel-generalized cepstral analysis , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[186]  Keiichi Tokuda,et al.  Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[187]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[188]  Alan W. Black,et al.  CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling , 2006, INTERSPEECH.

[189]  Martin Cooke Discovering consistent word confusions in noise , 2009, INTERSPEECH.

[190]  Cassia Valentini-Botinhao,et al.  Intelligibility-enhancing speech modifications: the hurricane challenge , 2020, INTERSPEECH.

[191]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[192]  Peter Vary,et al.  Near End Listening Enhancement: Speech Intelligibility Improvement in Noisy Environments , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[193]  David R. Beukelman,et al.  A comparison of speech synthesis intelligibility with listeners from three age groups , 1987 .

[194]  P. Luce,et al.  Spoken Word Recognition: The Challenge of Variation , 2005 .

[195]  J. Sundberg,et al.  Relationship between changes in voice pitch and loudness , 1988 .

[196]  Heiga Zen,et al.  Hidden Semi-Markov Model Based Speech Synthesis System , 2006 .

[197]  T. Houtgast,et al.  Quantifying the intelligibility of speech in noise for non-native listeners. , 2002, The Journal of the Acoustical Society of America.

[198]  Horabail S Venkatagiri Segmental intelligibility of four currently used text-to-speech synthesis methods. , 2003, The Journal of the Acoustical Society of America.

[199]  T. Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. II. Simulations and measurements. , 1996, The Journal of the Acoustical Society of America.

[200]  Hong Kook KIM,et al.  Spectral Peak-Weighted Liftering of Cepstral Coefficients for Speech Recognition , 2000 .

[201]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[202]  Usha Goswami,et al.  Similarity relations among spoken words: The special status of rimes in English , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[203]  Yuan Zhao,et al.  The effect of lexical frequency and Lombard reflex on tone hyperarticulation , 2009, J. Phonetics.

[204]  Yi Hu,et al.  A new sound coding strategy for suppressing noise in cochlear implants. , 2008, The Journal of the Acoustical Society of America.