Inferring Hearing Loss from Learned Speech Kernels

Does a hearing-impaired individual's speech reflect his hearing loss, and if it does, can the nature of hearing loss be inferred from his speech? To investigate these questions, at least four hours of speech data were recorded from each of 37 adult individuals, both male and female, belonging to four classes: 7 normal, and 30 severely-to-profoundly hearing impaired with high, medium or low speech intelligibility. Acoustic kernels were learned for each individual by capturing the distribution of his speech data points represented as 20 ms duration windows. These kernels were evaluated using a set of neurophysiological metrics, namely, distribution of characteristic frequencies, equal loudness contour, bandwidth and Q10 value of tuning curve. Our experimental results reveal that a hearing-impaired individual's speech does reflect his hearing loss provided his loss of hearing has considerably affected the intelligibility of his speech. For such individuals, the lack of tuning in any frequency range can be inferred from his learned speech kernels.

[1]  Geoffrey E. Hinton,et al.  Learning a better representation of speech soundwaves using restricted boltzmann machines , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Daniel P. W. Ellis,et al.  A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation , 2009, 2009 17th European Signal Processing Conference.

[3]  Wei Dai,et al.  Dictionary learning and update based on simultaneous codeword optimization (SimCO) , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[5]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[6]  H. Bastian Sensation and Perception.—I , 1869, Nature.

[7]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  J. Ryalls,et al.  Acoustic comparison of CV syllables in French-speaking children with normal hearing, moderate-to-severe and profound hearing impairment , 2003 .

[9]  Helen McCaffrey Morrison,et al.  The locus equation as an index of coarticulation in syllables produced by speakers with profound hearing loss , 2008, Clinical linguistics & phonetics.

[10]  Edoardo Arslan,et al.  Cochlear Implantation Outcome in Prelingually Deafened Young Adults , 2008, Audiology and Neurotology.

[11]  Earl E. Johnson,et al.  Effects of Degree and Configuration of Hearing Loss on the Contribution of High- and Low-Frequency Speech Information to Bilateral Speech Understanding , 2011, Ear and hearing.

[12]  F H Guenther,et al.  Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. , 1995, Psychological review.

[13]  Tuomas Virtanen,et al.  Unsupervised Learning Methods for Source Separation in Monaural Music Signals , 2006 .

[14]  M. J. Osberger,et al.  Speech Production Characteristics of the Hearing Impaired , 1982 .

[15]  Sheila R. Pratt,et al.  Aural Habilitation Update: The Role of Speech Production Skills of Infants and Children With Hearing Loss , 2005 .

[16]  N. Kiang,et al.  Acoustic trauma in cats. Cochlear pathology and auditory-nerve activity. , 1978, Acta oto-laryngologica. Supplementum.

[17]  Frank H. Guenther,et al.  Speech motor control: Acoustic goals, saturation effects, auditory feedback and internal models , 1997, Speech Commun..

[18]  J. P. Wilson,et al.  THE FREQUENCY SELECTIVITY OF THE COCHLEA , 1973 .

[19]  Dong Yu,et al.  Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition , 2010 .

[20]  H Goehl,et al.  Do the effects of adventitious deafness include disordered speech? , 1984, The Journal of speech and hearing disorders.

[21]  Mike E. Davies,et al.  Sparse and shift-Invariant representations of music , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Hugo Van hamme,et al.  Unsupervised learning of auditory filter banks using non-negative matrix factorisation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[24]  D. W. Robinson,et al.  A re-determination of the equal-loudness relations for pure tones , 1956 .

[25]  David B Pisoni,et al.  Cochlear Implantation in Adults With Prelingual Deafness. Part II. Underlying Constraints That Affect Audiological Outcomes , 2004, The Laryngoscope.

[26]  Mario A Ruggero,et al.  Threshold tuning curves of chinchilla auditory-nerve fibers. I. Dependence on characteristic frequency and relation to the magnitudes of cochlear vibrations. , 2008, Journal of neurophysiology.

[27]  J D Durrant,et al.  Maximum Permissible Ambient Noise Levels for Audiometric Test Rooms. , 1993, American journal of audiology.

[28]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[29]  Michael F Dorman,et al.  Development and Validation of the AzBio Sentence Lists , 2012, Ear and hearing.

[30]  Junghwa Bahng,et al.  Weighting of cues for fricative place of articulation perception by children wearing cochlear implants , 2011, International journal of audiology.

[31]  Shi Zhong,et al.  Efficient online spherical k-means clustering , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[32]  Jessika Eichel,et al.  FUNDAMENTALS OF HEARING: AN INTRODUCTION , 1978, The Ulster Medical Journal.

[33]  Vesna Mildner,et al.  Acoustic analysis of the speech of children with cochlear implants: A longitudinal study , 2007, Clinical linguistics & phonetics.

[34]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[35]  Ho-Young Jung,et al.  On the Efficient Speech Feature Extraction Based on Independent Component Analysis , 2002, Neural Processing Letters.

[36]  Justin Salamon,et al.  Unsupervised feature learning for urban sound classification , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[38]  J. Weisenberger Fundamentals of Hearing: An Introduction (3rd ed.) , 1994 .