Classification of Fricative Consonants for Speech Enhancement in Hearing Devices

Objective To investigate a set of acoustic features and classification methods for the classification of three groups of fricative consonants differing in place of articulation. Method A support vector machine (SVM) algorithm was used to classify the fricatives extracted from the TIMIT database in quiet and also in speech babble noise at various signal-to-noise ratios (SNRs). Spectral features including four spectral moments, peak, slope, Mel-frequency cepstral coefficients (MFCC), Gammatone filters outputs, and magnitudes of fast Fourier Transform (FFT) spectrum were used for the classification. The analysis frame was restricted to only 8 msec. In addition, commonly-used linear and nonlinear principal component analysis dimensionality reduction techniques that project a high-dimensional feature vector onto a lower dimensional space were examined. Results With 13 MFCC coefficients, 14 or 24 Gammatone filter outputs, classification performance was greater than or equal to 85% in quiet and at +10 dB SNR. Using 14 Gammatone filter outputs above 1 kHz, classification accuracy remained high (greater than 80%) for a wide range of SNRs from +20 to +5 dB SNR. Conclusions High levels of classification accuracy for fricative consonants in quiet and in noise could be achieved using only spectral features extracted from a short time window. Results of this work have a direct impact on the development of speech enhancement algorithms for hearing devices.

[1]  Deniz Erdogmus,et al.  Information Theoretic Feature Selection and Projection , 2008, Speech, Audio, Image and Biomedical Signal Processing using Neural Networks.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[4]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[5]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[6]  S. Nittrouer,et al.  The effect of segmental order on fricative labeling by children and adults , 2000, Perception & psychophysics.

[7]  H. Dillon,et al.  An international comparison of long‐term average speech spectra , 1994 .

[8]  Robert Allen Fox,et al.  Sex-related acoustic changes in voiceless English fricatives. , 2005, Journal of speech, language, and hearing research : JSLHR.

[9]  Hugh J. McDermott,et al.  Frequency-compression outcomes in listeners with steeply sloping audiograms , 2006, International journal of audiology.

[10]  D. Whalen Effects of vocalic formant transitions and vowel quality on the English [s]-[ŝ] boundary. , 1981, The Journal of the Acoustical Society of America.

[11]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[12]  Y. Kong,et al.  Using a Vocoder-Based Frequency-Lowering Method and Spectral Enhancement to Improve Place-of-Articulation Perception for Hearing-Impaired Listeners , 2013, Ear and hearing.

[13]  A M Ali,et al.  Acoustic-phonetic features for the automatic classification of fricatives. , 2001, The Journal of the Acoustical Society of America.

[14]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[15]  A. Jongman,et al.  Acoustic characteristics of English fricatives. , 2000, The Journal of the Acoustical Society of America.

[16]  P. Milenkovic,et al.  Statistical analysis of word-initial voiceless obstruents: preliminary data. , 1988, The Journal of the Acoustical Society of America.

[17]  D. Markle,et al.  Hearing Aids , 1936, The Journal of Laryngology & Otology.

[18]  M A Stone,et al.  Tolerable hearing aid delays. I. Estimation of limits imposed by the auditory path alone using simulated hearing losses. , 1999, Ear and hearing.

[19]  R V Shannon,et al.  Consonant recordings for speech testing. , 1999, The Journal of the Acoustical Society of America.

[20]  A. Jongman,et al.  Acoustic characteristics of clearly spoken English fricatives. , 2009, The Journal of the Acoustical Society of America.

[21]  Israel Cohen,et al.  Classification of Unvoiced Fricative Phonemes using Geometric Methods , 2010 .

[22]  Ying-Yee Kong,et al.  On the development of a frequency-lowering system that enhances place-of-articulation perception , 2012, Speech Commun..

[23]  Francis Kuk,et al.  Efficacy of linear frequency transposition on consonant identification in quiet and in noise. , 2009, Journal of the American Academy of Audiology.

[24]  Yizhar Lavner,et al.  Acoustic-phonetic analysis of fricatives for classification using SVM based algorithm , 2010, 2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel.

[25]  F. Zeng,et al.  Recognition of voiceless fricatives by normal and hearing-impaired subjects. , 1990, Journal of speech and hearing research.

[26]  B. Moore,et al.  Using transposition to improve consonant discrimination and detection for listeners with severe high-frequency hearing loss , 2007, International Journal of Audiology.

[27]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[28]  Björn W. Schuller,et al.  Exploring Nonnegative Matrix Factorization for Audio Classification: Application to Speaker Recognition , 2012, ITG Conference on Speech Communication.

[29]  Identification of voiceless fricatives by high frequency hearing impaired listeners. , 1969, Journal of speech and hearing research.

[30]  Acoustic comparison of child and adult fricatives , 2000 .

[31]  W. M. Rabinowitz,et al.  Standardization of a test of speech perception in noise. , 1979, Journal of speech and hearing research.

[32]  P. Stelmachowicz,et al.  Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults. , 2001, The Journal of the Acoustical Society of America.

[33]  C M Reed,et al.  Intelligibility of frequency-lowered speech produced by a channel vocoder. , 1993, Journal of rehabilitation research and development.

[34]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .