Speaker independent continuous speech and isolated digit recognition using VQ and HMM

The main objective of this paper is to explore the effectiveness of perceptual features for performing isolated digits and continuous speech recognition. The proposed perceptual features are captured and code book indices are extracted. Expectation maximization algorithm is used to generate HMM models for the speeches. Speech recognition system is evaluated on clean test speeches and the experimental results reveal the performance of the proposed algorithm in recognizing isolated digits and continuous speeches based on maximum log likelihood value between test features and HMM models for each speech. Performance of these features is tested on speeches randomly chosen from “TI Digits_1”, “TI Digits_2” and “TIMIT” databases. This algorithm is tested for VQ and combination of VQ and HMM speech modeling techniques. Perceptual linear predictive cepstrum yields the accuracy of 86% and 93% for speaker independent isolated digit recognition using VQ and combination of VQ & HMM speech models respectively. This feature also gives 99% and 100% accuracy for speaker independent continuous speech recognition by using VQ and the combination of VQ & HMM speech modeling techniques.

[1]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[2]  Y. Venkataramani,et al.  Perceptual Features Based Isolated Digit and Continuous Speech Recognition Using Iterative Clustering Approach , 2009, 2009 First International Conference on Networks & Communications.

[3]  牧野 正三 Perceptually based processing in automatic speech recognition , 1986 .

[4]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  Y. Venkataramani,et al.  Use of perceptual features in iterative clustering based twins identification system , 2008, 2008 International Conference on Computing, Communication and Networking.

[7]  Y. Venkataramani,et al.  Iterative Clustering Approach for Text Independent Speaker Identification using Multiple Features , 2008, 2008 2nd International Conference on Signal Processing and Communication Systems.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Hynek Hermansky,et al.  The challenge of inverse-E: the RASTA-PLP method , 1991, [1991] Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems & Computers.

[10]  S. Arivazhagan,et al.  Fingerprint Verification Using Gabor Co-occurrence Features , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[11]  Y. Venkataramani,et al.  Text Independent Composite Speaker Identification/Verification Using Multiple Features , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[12]  Y. Venkataramani,et al.  Effectiveness of LP Derived Features and DCTC in Twins Identification - Iterative Speaker Clustering Approach , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).