Speech recognition of different sampling rates using fractal code descriptor

Currently, the use of speech recognition is increaseingly in many applications such as mobile device interaction, interactive voice response system, voice search, voice dictation and voice identification. The heart of such applications is speech features needed to represent input signals. However, in real applications, speech signals are sampled with various sampling rates. The different sampling rates of input speech lead to the different features. This makes the speech recognition rate dropping. Therefore, this paper proposes an independent resolution descriptor based on fractal codes obtained by fractal encoding and decoding processes. The encoding process extracts fractal codes from partitioned speech signals, whereas the decoding process reconstructs independent resolution speech signals from the fractal codes. This method can effectively reconstruct speech signals at any sampling rates, especially at a higher sampling rate, which is a grand challenge. The proposed method is evaluated the performance by testing with AN4 corpus of CMU Sphinx speech recognition engine. The experimental results show that the proposed method can improve the accuracy of speech recognition, even if the sampling rate of testing speeches differs from that of training speeches.

[1]  Lawrence R. Rabiner,et al.  Applications of speech recognition in the area of telecommunications , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[2]  Pappu Mandal,et al.  A Review on Speech Recognition , 2015 .

[3]  Mei-Yuh Hwang,et al.  Speech recognition using hidden Markov models: A CMU perspective , 1990, Speech Communication.

[4]  Junlan Feng A general framework for building natural language understanding modules in voice search , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  A. Jacquin Fractal image coding: a review , 1993, Proc. IEEE.

[6]  Ilan D. Shallom,et al.  A comparison Study of Cepstral Analysis with Applications to Speech Recognition , 2006, 2006 International Conference on Information Technology: Research and Education.

[7]  John H. L. Hansen,et al.  A Review on Speech Recognition Technique , 2010 .

[8]  Pravin Yannawar,et al.  A Review on Speech Recognition Technique , 2010 .

[9]  Hans-Günter Hirsch,et al.  Speech recognition at multiple sampling rates , 2001, INTERSPEECH.

[10]  Douglas B. Paul,et al.  Speech Recognition Using Hidden Markov Models , 1990 .

[11]  Tanja Schultz,et al.  Speaker de-identification via voice transformation , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[12]  Sunil Kumar Kopparapu,et al.  Recognition of subsampled speech using a modified Mel filter bank , 2013, Comput. Electr. Eng..

[13]  Sunil Kumar Kopparapu,et al.  Choice of Mel filter bank in computing MFCC of a resampled speech , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[14]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[15]  Kuldip K. Paliwal,et al.  Effect of different sampling rates and feature vector sizes on speech recognition performance , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).