Specific two words lexical semantic recognition based on the wavelet transform of narrowband spectrogram

This paper presents a method based on wavelet transform of the narrowband spectrogram for specific two words Chinese lexical recognition. In the process of image feature extraction, the image processing technique is applied to the speech recognition field. Firstly, two-dimensional discrete db4 wavelet is used to decompose the narrowband spectrogram, which is divided into 6 layers of wavelet package decomposition, and calculates the approximate energy value. Then, the extracted approximate energy value is divided into level detail energy value, vertical detail energy and diagonal detail energy value, sets respectively as the narrowband spectrogram of the first characteristic set, the second and third feature set. The above three feature sets are used as feature vectors to support vector machine as a classifier for the overall recognition of two words Chinese vocabulary. 1000 voice samples are used in the simulation experiment. The results show that this method correct recognition rate can reach 96 percent.

[1]  Suphakant Phimoltares,et al.  Speech and music classification using hybrid Form of spectrogram and fourier transformation , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[2]  Tuomas Virtanen,et al.  Modelling non-stationary noise with spectral factorisation in automatic speech recognition , 2013, Comput. Speech Lang..

[3]  A. Ghalwash,et al.  CNN: A speaker recognition system using a cascaded neural network , 1996, Multidimens. Syst. Signal Process..

[4]  Shrikanth S. Narayanan,et al.  Automatic speaker age and gender recognition using acoustic and prosodic level information fusion , 2013, Comput. Speech Lang..

[5]  Pawan K. Ajmera,et al.  Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram , 2011, Pattern Recognit..

[6]  Bert Cranen,et al.  Sparse imputation for large vocabulary noise robust ASR , 2011, Comput. Speech Lang..

[7]  D. Klatt,et al.  On the automatic recognition of continuous speech:Implications from a spectrogram-reading experiment , 1973 .

[8]  S. Masud,et al.  Continuous Arabic Speech Segmentation using FFT Spectrogram , 2006, 2006 Innovations in Information Technology.

[9]  Akira Ogawa,et al.  Reduction of Noise in Speech Signals through Image Processing using the Spectrogram , 2006 .

[10]  Diego H. Milone,et al.  Spoken emotion recognition using hierarchical classifiers , 2011, Comput. Speech Lang..

[11]  Lori Lamel,et al.  An expert spectrogram reader: A knowledge-based approach to speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  John R. Hershey,et al.  Super-human multi-talker speech recognition: A graphical modeling approach , 2010, Comput. Speech Lang..

[13]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[14]  Kun-Ching Wang,et al.  The Feature Extraction Based on Texture Image Information for Emotion Sensing in Speech , 2014, Sensors.

[15]  Richard C. Rose,et al.  A Performance Monitoring Approach to Fusing Enhanced Spectrogram Channels in Robust Speech Recognition , 2011, INTERSPEECH.

[16]  Mohammad Inayatullah,et al.  Improving Speaker Independent Speech Recognition Process using Speech Recognition Engine , 2008, IC-AI.

[17]  B. K. Mishra Proceedings of the International Conference and Workshop on Emerging Trends in Technology , 2010 .

[18]  Tridibesh Dutta,et al.  Dynamic Time Warping Based Approach to Text-Dependent Speaker Identification Using Spectrograms , 2008, 2008 Congress on Image and Signal Processing.

[19]  Ning Ma,et al.  Speech fragment decoding techniques for simultaneous speaker identification and speech recognition , 2010, Comput. Speech Lang..

[20]  Chidchanok Lursinsap,et al.  Singing voice recognition based on matching of spectrogram pattern , 2009, 2009 International Joint Conference on Neural Networks.

[21]  A. P. Kabilan,et al.  Speaker independent speech recognition system based on phoneme identification , 2008, 2008 International Conference on Computing, Communication and Networking.

[22]  Tao,et al.  Speech endpoint detection in low-SNRs environment based on perception spectrogram structure boundary parameter , 2014 .

[23]  Xu Zhao,et al.  A Mathematical Morphological Processing of Spectrograms for the Tone of Chinese Vowels Recognition , 2014 .

[24]  Mark J. F. Gales,et al.  Structured SVMs for Automatic Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Noboru Sugie,et al.  Sound Source Separation with Two Spectrograms by Image Processing , 2004 .

[26]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[27]  Keikichi Hirose,et al.  Tone nucleus modeling for Chinese lexical tone recognition , 2004, Speech Commun..

[28]  C. Burrus,et al.  Introduction to Wavelets and Wavelet Transforms: A Primer , 1997 .

[29]  Bo Xu,et al.  A novel robust feature of speech signal based on the Mellin transform for speaker-independent speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[30]  Zhang Hua Automatic Recognition of Chinese Personal Name Based on Role Tagging , 2004 .

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.