Spoken Indian language classification using artificial neural network — An experimental study

The ability to classify and identify the language being spoken is of immense value in a multi-lingual society like India. Spoken language identification is the process of identifying the language being spoken in an audio utterance. Spoken Indian language identification is an active research area because of the increased use of voice based self help systems being deployed by enterprises and government for mass usage. With the advent of deep neural networks (DNN) the need for feature extraction has been attenuated, however this needs large amounts of annotated data. Unfortunately, most Indian languages are resource deficit meaning the advantage of DNN can not be exploited. In this paper, we experiment with different speech feature sets to train artificial neural network based classifiers using back propagation algorithm. We present results for different configuration of the feature sets using five fold cross-validation. Experimental results show that delta and double delta MFCC feature extraction technique using artificial neural network shows consistently better recognition accuracy.