Recognition of two words Chinese lexical for non-specific people using feature fusion of broadband and narrowband spectrogram

This paper presents a recognition method to two words Chinese lexical for non-specific people using feature fusion of broadband and narrow-band spectrogram. In the process of image feature extraction, the image processing technique is applicable to the speech recognition field. First, equal width zoning line projection and binary width zoning line projection is carried out in the narrow-band spectrogram, sets respectively as the narrow-band spectrogram of the first character set and the second character set. Meanwhile, equal width zoning line projection is carried out again in the narrowband spectrogram after Fourier transforms, treating it as the third feature set. Then, equal width column projection is carried out in the broadband spectrogram, regarding it as the fourth feature set. The above four feature sets are used as the feature vectors for support vector machine (SVM) as a classifier to the overall recognition of two words Chinese lexical for non-specific people. A total of 1000 voice samples are used in the simulation experiment. The correct recognition rate of two words Chinese lexical recognition by using the feature value fusion of the above four features can reach 93.6 percent, this method of feature fusion provides a new way of thinking of Chinese vocabulary overall recognition.

[1]  Hynek Hermansky,et al.  Robust Feature Extraction Using Modulation Filtering of Autoregressive Models , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  D. Klatt,et al.  On the automatic recognition of continuous speech:Implications from a spectrogram-reading experiment , 1973 .

[3]  Michael Riley,et al.  Schematizing spectrograms for speech recognition , 1983 .

[4]  Tridibesh Dutta,et al.  Dynamic Time Warping Based Approach to Text-Dependent Speaker Identification Using Spectrograms , 2008, 2008 Congress on Image and Signal Processing.

[5]  Xu Zhao,et al.  A Mathematical Morphological Processing of Spectrograms for the Tone of Chinese Vowels Recognition , 2014 .

[6]  Mark J. F. Gales,et al.  Structured SVMs for Automatic Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Noboru Sugie,et al.  Sound Source Separation with Two Spectrograms by Image Processing , 2004 .

[8]  Kun-Ching Wang,et al.  The Feature Extraction Based on Texture Image Information for Emotion Sensing in Speech , 2014, Sensors.

[9]  Suphakant Phimoltares,et al.  Speech and music classification using hybrid Form of spectrogram and fourier transformation , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[10]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[11]  Tao Zhang,et al.  Learning Spectral Mapping for Speech Dereverberation and Denoising , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Liqing Zhang,et al.  Robust Multifactor Speech Feature Extraction Based on Gabor Analysis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Tao,et al.  Speech endpoint detection in low-SNRs environment based on perception spectrogram structure boundary parameter , 2014 .

[14]  Douglas D. O'Shaughnessy,et al.  Segmentation of a speech spectrogram using mathematical morphology , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Pawan K. Ajmera,et al.  Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram , 2011, Pattern Recognit..

[16]  Ali Taylan Cemgil,et al.  Single-Channel Speech-Music Separation for Robust ASR With Mixture Models , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Akira Ogawa,et al.  Reduction of Noise in Speech Signals through Image Processing using the Spectrogram , 2006 .

[18]  Lori Lamel,et al.  An expert spectrogram reader: A knowledge-based approach to speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Richard C. Rose,et al.  A wavelet-based data imputation approach to spectrogram reconstruction for robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  S. Masud,et al.  Continuous Arabic Speech Segmentation using FFT Spectrogram , 2006, 2006 Innovations in Information Technology.

[21]  Xu Shao,et al.  Prediction of Fundamental Frequency and Voicing From Mel-Frequency Cepstral Coefficients for Unconstrained Speech Reconstruction , 2007, IEEE Transactions on Audio, Speech, and Language Processing.