The Hearing-Aid Speech Perception Index (HASPI)

Abstract This paper presents a new index for predicting speech intelligibility for normal-hearing and hearing-impaired listeners. The Hearing-Aid Speech Perception Index (HASPI) is based on a model of the auditory periphery that incorporates changes due to hearing loss. The index compares the envelope and temporal fine structure outputs of the auditory model for a reference signal to the outputs of the model for the signal under test. The auditory model for the reference signal is set for normal hearing, while the model for the test signal incorporates the peripheral hearing loss. The new index is compared to indices based on measuring the coherence between the reference and test signals and based on measuring the envelope correlation between the two signals. HASPI is found to give accurate intelligibility predictions for a wide range of signal degradations including speech degraded by noise and nonlinear distortion, speech processed using frequency compression, noisy speech processed through a noise-suppression algorithm, and speech where the high frequencies are replaced by the output of a noise vocoder. The coherence and envelope metrics used for comparison give poor performance for at least one of these test conditions.

[1]  Zachary M. Smith,et al.  Chimaeric sounds reveal dichotomies in auditory perception , 2002, Nature.

[2]  Hugh J. McDermott,et al.  Improvements in speech perception with an experimental nonlinear frequency compression hearing device , 2005, International journal of audiology.

[3]  Torsten Dau,et al.  Prediction of speech intelligibility based on an auditory preprocessing model , 2010, Speech Commun..

[4]  J. H. Steiger Tests for comparing elements of a correlation matrix. , 1980 .

[5]  Pamela Souza,et al.  Working Memory, Age, and Hearing Loss: Susceptibility to Hearing Aid Distortion , 2013, Ear and hearing.

[6]  M. Sachs,et al.  An auditory-periphery model of the effects of acoustic trauma on auditory nerve responses. , 2003, The Journal of the Acoustical Society of America.

[7]  B Kollmeier,et al.  Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. , 1996, The Journal of the Acoustical Society of America.

[8]  W. S. Rhode,et al.  Mechanical responses to two-tone distortion products in the apical and basal turns of the mammalian cochlea. , 1997, Journal of neurophysiology.

[9]  Daniel Fogerty,et al.  Perceptual weighting of individual and concurrent cues for sentence intelligibility: frequency, envelope, and fine structure. , 2011, The Journal of the Acoustical Society of America.

[10]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[11]  J M Kates,et al.  On using coherence to measure distortion in hearing aids. , 1992, The Journal of the Acoustical Society of America.

[12]  Kuldip K. Paliwal,et al.  Improving objective intelligibility prediction by combining correlation and coherence based methods with a measure based on the negative distortion ratio , 2012, Speech Commun..

[13]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Melinda C. Anderson The role of temporal fine structure in sound quality perception , 2010 .

[15]  Hugh J. McDermott A Technical Comparison of Digital Frequency-Lowering Algorithms Available in Two Current Hearing Aids , 2011, PloS one.

[16]  D Byrne,et al.  Speech recognition of hearing-impaired listeners: predictions from audibility and the limited role of high-frequency amplification. , 1998, The Journal of the Acoustical Society of America.

[17]  James M. Kates,et al.  The Hearing-Aid Speech Quality Index (HASQI) , 2010 .

[18]  Andrew Hines,et al.  Speech intelligibility from image processing , 2010, Speech Commun..

[19]  S. Bacon,et al.  Psychophysical measures of auditory nonlinearities as a function of frequency in individuals with normal hearing. , 1999, The Journal of the Acoustical Society of America.

[20]  Brian C J Moore,et al.  The effects of age and cochlear hearing loss on temporal fine structure sensitivity, frequency selectivity, and speech reception in noise. , 2011, The Journal of the Acoustical Society of America.

[21]  B. Moore,et al.  A revised model of loudness perception applied to cochlear hearing loss , 2004, Hearing Research.

[22]  Mounya Elhilali,et al.  A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..

[23]  S. Zahorian,et al.  Spectral-shape features versus formants as acoustic correlates for vowels. , 1993, The Journal of the Acoustical Society of America.

[24]  Andrew J Oxenham,et al.  Perception of across-frequency asynchrony and the role of cochlear delays. , 2012, The Journal of the Acoustical Society of America.

[25]  S. Soli,et al.  Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. , 1994, The Journal of the Acoustical Society of America.

[26]  Thomas Lunner,et al.  Relationship between distortion and working memory for digital noise-reduction processing in hearing aids , 2013 .

[27]  Jesper Jensen,et al.  An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech. , 2011, The Journal of the Acoustical Society of America.

[28]  Fei Chen,et al.  Predicting the Intelligibility of Vocoded Speech , 2011, Ear and hearing.

[29]  Stuart Rosen,et al.  Effects of envelope bandwidth on the intelligibility of sine- and noise-vocoded speech. , 2009, The Journal of the Acoustical Society of America.

[30]  B. Moore,et al.  Benefit of high-rate envelope cues in vocoder processing: effect of number of channels and spectral region. , 2008, The Journal of the Acoustical Society of America.

[31]  Stefano Cosentino,et al.  Towards objective measures of speech intelligibility for Cochlear Implant users in reverberant environments , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[32]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[33]  P J Abbas,et al.  AP measurements of short-term adaptation in normal and in acoustically traumatized ears. , 1981, The Journal of the Acoustical Society of America.

[34]  Yôiti Suzuki,et al.  Equal-loudness-level contours for pure tones. , 2004, The Journal of the Acoustical Society of America.

[35]  James M. Kates An auditory model for intelligibility and quality predictions , 2013 .

[36]  M. Sachs,et al.  Rate versus level functions for auditory-nerve fibers in cats: tone-burst stimuli. , 1974, The Journal of the Acoustical Society of America.

[37]  DeLiang Wang,et al.  Speech perception of noise with binary gains. , 2008, The Journal of the Acoustical Society of America.

[38]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[39]  D D Dirks,et al.  Application of the Articulation Index and the Speech Transmission Index to the recognition of speech by normal-hearing and hearing-impaired listeners. , 1986, Journal of speech and hearing research.

[40]  L D Braida,et al.  Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing. , 1994, The Journal of the Acoustical Society of America.

[41]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[42]  H. Dillon,et al.  The National Acoustic Laboratories' (NAL) New Procedure for Selecting the Gain and Frequency Response of a Hearing Aid , 1986, Ear and hearing.

[43]  S. Zahorian Principal‐Components Analysis for Low Redundancy Encoding of Speech Spectra , 1979 .

[44]  E. J. Williams The Comparison of Regression Variables , 1959 .

[45]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[46]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[47]  J.C. Rutledge,et al.  Frequency lowering processing for listeners with significant hearing loss , 1999, ICECS'99. Proceedings of ICECS '99. 6th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.99EX357).

[48]  Karen L. Payton,et al.  Analysis of short‐time speech transmission index algorithms , 2008 .

[49]  A. Oxenham,et al.  Basilar-membrane nonlinearity estimated by pulsation threshold. , 2000, The Journal of the Acoustical Society of America.

[50]  Pamela E Souza,et al.  Exploring the limits of frequency lowering. , 2013, Journal of speech, language, and hearing research : JSLHR.

[51]  James M. Kates,et al.  Digital hearing aids. , 2008, Harvard health letter.

[52]  Ian M. Winter,et al.  Basilar membrane nonlinearity determines auditory nerve rate-intensity functions and cochlear dynamic range , 1990, Hearing Research.

[53]  G. Carter,et al.  Estimation of the magnitude-squared coherence function via overlapped fast Fourier transform processing , 1973 .

[54]  B Kollmeier,et al.  The effect of multichannel dynamic compression on speech intelligibility. , 1995, The Journal of the Acoustical Society of America.

[55]  Fei Chen,et al.  Effect of temporal fine structure on speech intelligibility modeling , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[56]  L. V. Immerseel,et al.  Digital implementation of linear gammatone filters: Comparison of design methods , 2003 .

[57]  C V Pavlovic,et al.  An articulation index based procedure for predicting the speech recognition performance of hearing-impaired individuals. , 1986, The Journal of the Acoustical Society of America.

[58]  James M. Kates,et al.  A time-domain digital cochlear model , 1991, IEEE Trans. Signal Process..

[59]  L. Carney,et al.  A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression. , 2001, The Journal of the Acoustical Society of America.

[60]  R. Patterson,et al.  Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[61]  Steven Greenberg,et al.  What are the Essential Cues for Understanding Spoken Language? , 2001, IEICE Trans. Inf. Syst..

[62]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[63]  Satoshi Imai,et al.  Cepstral analysis synthesis on the mel frequency scale , 1983, ICASSP.

[64]  B C Moore,et al.  Inter-relationship between different psychoacoustic measures assumed to be related to the cochlear active mechanism. , 1999, The Journal of the Acoustical Society of America.

[65]  Brian C J Moore,et al.  Effects of moderate cochlear hearing loss on the ability to benefit from temporal fine structure information in speech. , 2008, The Journal of the Acoustical Society of America.

[66]  P. Dallos,et al.  Forward masking of auditory nerve fiber responses. , 1979, Journal of neurophysiology.

[67]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[68]  Thomas F. Quatieri,et al.  Speech transformations based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[69]  J C Shaw,et al.  An introduction to the coherence function and its use in EEG signal analysis. , 1981, Journal of medical engineering & technology.

[70]  Muhammad S A Zilany,et al.  Modeling auditory-nerve responses for high sound pressure levels in the normal and impaired auditory periphery. , 2006, The Journal of the Acoustical Society of America.

[71]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[72]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[73]  Thomas Lunner,et al.  Effects of noise and working memory capacity on memory processing of speech for hearing-aid users , 2013, International journal of audiology.

[74]  S. Zahorian,et al.  Dynamic spectral shape features as acoustic correlates for initial stop consonants , 1991 .