Intelligibility of time-compressed speech: the effect of uniform versus non-uniform time-compression algorithms.

For assessing hearing aid algorithms, a method is sought to shift the threshold of a speech-in-noise test to (mostly positive) signal-to-noise ratios (SNRs) that allow discrimination across algorithmic settings and are most relevant for hearing-impaired listeners in daily life. Hence, time-compressed speech with higher speech rates was evaluated to parametrically increase the difficulty of the test while preserving most of the relevant acoustical speech cues. A uniform and a non-uniform algorithm were used to compress the sentences of the German Oldenburg Sentence Test at different speech rates. In comparison, the non-uniform algorithm exhibited greater deviations from the targeted time compression, as well as greater changes of the phoneme duration, spectra, and modulation spectra. Speech intelligibility for fast Oldenburg sentences in background noise at different SNRs was determined with 48 normal-hearing listeners. The results confirmed decreasing intelligibility with increasing speech rate. Speech had to be compressed to more than 30% of its original length to reach 50% intelligibility at positive SNRs. Characteristics influencing the discrimination ability of the test for assessing effective SNR changes were investigated. Subjective and objective measures indicated a clear advantage of the uniform algorithm in comparison to the non-uniform algorithm for the application in speech-in-noise tests.

[1]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[2]  W. Olsen,et al.  Average Speech Levels and Spectra in Various Speaking/Listening Conditions: A Summary of the Pearson, Bennett, & Fidell (1977) Report. , 1998, American journal of audiology.

[3]  Wouter A Dreschler,et al.  The relationship between the intelligibility of time-compressed speech and speech in noise in young and elderly listeners. , 2002, The Journal of the Acoustical Society of America.

[4]  Hyung Soon Kim,et al.  Variable time-scale modification of speech using transient information , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Malcolm Slaney,et al.  MACH1: nonuniform time-scale modification of speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  T. Busey,et al.  Auditory speech recognition and visual text recognition in younger and older adults: similarities and differences between modalities and the effects of presentation rate. , 2007, Journal of speech, language, and hearing research : JSLHR.

[7]  Bruce A Schneider,et al.  Speech comprehension difficulties in older adults: cognitive slowing or age-related changes in hearing? , 2005, Psychology and aging.

[8]  Steve Whittaker,et al.  Time is of the essence: an evaluation of temporal compression algorithms , 2006, CHI.

[9]  Wai C. Chu,et al.  Energy-based nonuniform time-scale compression of audio signals , 2003, IEEE Trans. Consumer Electron..

[10]  Sandra Gordon-Salant,et al.  Recognition of rapid speech by blind and sighted older adults. , 2011, Journal of speech, language, and hearing research : JSLHR.

[11]  Esther Janse,et al.  Perceptual learning of time-compressed and natural fast speech. , 2009, The Journal of the Acoustical Society of America.

[12]  Sheng Liu,et al.  Temporal properties in clear speech perception. , 2006, The Journal of the Acoustical Society of America.

[13]  David Dorran,et al.  Audio Time-Scale Modification , 2005 .

[14]  Emmanuel Dupoux,et al.  Perceptual adjustment to highly compressed speech: effects of talker and rate changes. , 1997, Journal of experimental psychology. Human perception and performance.

[15]  D. Aaronson,et al.  Perception and immediate recall of normal and “compressed” auditory sequences , 1971 .

[16]  Werner Verhelst,et al.  Efficient non-uniform time-scaling of speech with WSOLA for CALL applications , 2004 .

[17]  Yannis Stylianou,et al.  Detection of non-stationarity in speech signals and its application to time-scaling , 1999, EUROSPEECH.

[18]  Amalia Arvaniti,et al.  The Acoustic Characteristics of Geminate Consonants in Cypriot Greek , 2001 .

[19]  Patrick A. Naylor,et al.  Application of the DYPSA algorithm to segmented time scale modification of speech , 2008, 2008 16th European Signal Processing Conference.

[20]  S. Soli,et al.  Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. , 1994, The Journal of the Acoustical Society of America.

[21]  B Kollmeier,et al.  Development and evaluation of a German sentence test for objective and subjective speech intelligibility assessment. , 1997, The Journal of the Acoustical Society of America.

[22]  Martin Hansen,et al.  Measurement and prediction of the acceptable noise level for single-microphone noise reduction algorithms , 2012, International journal of audiology.

[23]  M. Glanzer,et al.  Two storage mechanisms in free recall , 1966 .

[24]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[25]  Anoop Gupta,et al.  Exploring benefits of non-linear time compression , 2001, MULTIMEDIA '01.

[26]  Esther Janse,et al.  Production and perception of fast speech , 2003 .

[27]  Murdock,et al.  The serial position effect of free recall , 1962 .

[28]  Robert E. Moore,et al.  Effect of rate-alteration on speech perception in noise in older adults with normal hearing and hearing impairment. , 2012, American journal of audiology.

[29]  Kazuhiro Kondo,et al.  Differences in the effect of time-expanded and time-contracted speech on intelligibility by phonetic feature , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).