Speech Based Interaction System Using DNN and i-vector

In this paper, a speech based interaction system using Deep Neural Network (DNN) and i-vector based DNN approaches are proposed. In DNN based approach, Mel-frequency cepstral coefficients (MFCC) features are extracted from the speech signal and it is directly given to DNN. In i-vector based DNN approach, DNN is trained using i-vector which is formed from Gaussian Mixture Model-Universal Background Model (GMM-UBM). For both approaches, the performance of the system is obtained in the form of confusion matrix and compared. In addition to that, GMM-UBM based approach is also compared with the proposed work. MFCC is used for representing the characteristics of the speech and auto encoder is used for classification purpose. It uses stacked two auto encoder layers and one soft max layer. The proposed system achieves improvement in performance when increasing the number of hidden units and the input dimension of MFCC features. The proposed work is to develop ASR system for isolated words in Tamil language and the experiments are conducted for speaker independent case. The results demonstrated that i-vector based DNN approach provides 100% recognition rate for 17 classes with 20 hidden units in each of the 2 layers. The dimension of i-vector is 100.