Speech recognition by clustering wavelet and PLP coefficients

This thesis explores the use of K-means clustering, wavelets and perceptual linear predictive (PLP) analysis for speech recognition. First, we would like to compare the performance of this method to those of previous speech recognition techniques. Next, we want to test if both wavelet and PLP coefficients are important to the analysis. Finally, we want to try out various means of improving the clustering method. The task of phonetic classification is used as the basis of comparison. Three different sets of phonemes are chosen from the TIMIT database: 16 vowels, 24 consonants and 39 phonetic classes. Coefficients of the Haar wavelet transform and the 5th order PLP analysis are combined to form a 42-dimensional vector for each phoneme. Clusters of phoneme vectors obtained by K-means clustering are then used to classify test vectors. Classification experiments using the NIST train and test sets show that independent clustering of phonemes with proportional phoneme emphasis is the best clustering strategy. It yields an accuracy of 55.4% for vowel classification, 54.6% for consonant classification and 50.9% for phoneme classification. Tests with waveletonly vectors and PLP-only vectors show that both the wavelet transform and PLP analysis are significantly important to the phonetic classifier. Results also show that wavelet coefficients are useful for detecting sound transitions, which are abundant in consonants. Thesis Supervisor: Kenneth Yip Title: Visiting Assistant Professor

[1]  James R. Glass,et al.  Heterogeneous acoustic measurements for phonetic classification 1 , 1997, EUROSPEECH.

[2]  James R. Glass,et al.  HETEROGENEOUS ACOUSTIC MEASUREMENTS FOR PHONETIC CLASSIFICATION , 1997 .

[3]  Shubha Kadambe,et al.  Application of the wavelet transform for pitch detection of speech signals , 1992, IEEE Trans. Inf. Theory.

[4]  James R. Glass,et al.  Vowel classification based on analysis-by-synthesis , 1992, ICSLP.

[5]  Ronald A. Cole,et al.  Perceptual studies on vowels excised from continuous speech , 1992, ICSLP.

[6]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[7]  J. Davenport Editor , 1960 .

[8]  M. Wickerhauser Acoustic signal compression with wavelet packets , 1993 .

[9]  O. Rioul,et al.  Wavelets and signal processing , 1991, IEEE Signal Processing Magazine.

[10]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[11]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[12]  James R. Glass,et al.  Statistical trajectory models for phonetic recognition , 1994, ICSLP.

[13]  Chung Leung Hong The use of artificial neural networks for phonetic recognition , 1989 .

[14]  Richard Kronland-Martinet,et al.  The Wavelet Transform for Analysis, Synthesis, and Processing of Speech and Music Sounds , 1988 .

[15]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[16]  Stephen A. Zahorian,et al.  Phone classification with segmental features and a binary-pair partitioned neural network classifier , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Mark A. Clements,et al.  Phonemic recognition using a large hidden Markov model , 1992, IEEE Trans. Signal Process..

[18]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[19]  D. W. Robinson,et al.  A re-determination of the equal-loudness relations for pure tones , 1956 .

[20]  Beng T. Tan,et al.  Applying wavelet analysis to speech segmentation and classification , 1994, Defense, Security, and Sensing.

[21]  Raymond Y. T. Chun,et al.  A hierarchical feature representation for phonetic classification , 1996 .

[22]  John Makhoul,et al.  LPCW: An LPC vocoder with linear predictive spectral warping , 1976, ICASSP.

[23]  James R. Glass,et al.  A comparative study of signal representations and classification techniques for speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Helen Meng,et al.  The Use of Distinctive Features for Automatic Speech Recognition , 1991 .

[25]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[26]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[27]  Richard Kronland-Martinet,et al.  Analysis of Sound Patterns through Wavelet transforms , 1987, Int. J. Pattern Recognit. Artif. Intell..

[28]  S. S. Stevens On the psychophysical law. , 1957, Psychological review.