Implicit language identification system based on random forest and support vector machine for speech

Speech uttered by the human beings contains the information about speakers, languages and contents. Language of uttered speech can easily be identified by extracting the language specific information from it. Identification of language of speech is known as Language Identification (LID). Identification of language from speech is helpful in its translation, speech recognition and speech activated automatic systems. LID system may also play an important role in speaker recognition as identification of language can be used to reduce search space. In this paper an approach based on Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCCs) features for language identification is proposed using SVM and Random Forest (RF) classification techniques. Both LPC and MFCC features are vocal tract features. LPC and MFCC features extracted from uttered speech contain language as well as speaker related informations. Identification of language highly depends upon extraction of language specific features. Both these vocal tract parameters of speech contain lot of information about languages spoken compared to other parameters like excitation source parameters and prosodic parameters. Hence combination of these features performs better than individual. Experiments have been performed on the database obtained from IIIT-Hyderabad consisting of 5000 multilingual clean speech signals (Hindi, Bengali, Telugu, Tamil, Marathi and Malayalam). For training the proposed model, 600 speech signals are taken arbitrarily from the above database. Language model are created for each language. Evaluation of the proposed models has been made using other 300 speech signals from same database. Language models are evaluated using individual features as well as combined features. Experiments performed by taking both features at a time give better result as compared to taking individual features one at a time. Using these features, the accuracy of language identification is not more than 80% so far as claimed by other researchers. In the proposed approach, the accuracy of language identification is improved to 92.6% using combination of same features and random forest model.

[1]  J. Foil,et al.  Language identification using noisy speech , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Roger C. F. Tucker,et al.  Automatic language identification using sub-word models , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Kung-Pu Li Automatic language identification using syllabic spectral features , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  D. A. van Leeuwen,et al.  Speech and Audio Signal Processing , 2011 .

[5]  Vennila Ramalingam,et al.  A hierarchical language identification system for Indian languages , 2012, Digit. Signal Process..

[6]  François Pellegrino,et al.  An unsupervised approach to language identification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Joaquín González-Rodríguez,et al.  Automatic language identification using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Shambhu Shankar Bharti,et al.  A new spectral subtraction method for speech enhancement using adaptive noise estimation , 2016, 2016 3rd International Conference on Recent Advances in Information Technology (RAIT).

[11]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[12]  Inma Hernáez,et al.  Audio Classification Techniques in Home Environments for Elderly/Dependant People , 2010, ICCHP.

[13]  Jérôme Farinas,et al.  Rhythmic unit extraction and modelling for automatic language identification , 2005, Speech Commun..

[14]  Chin-Hui Lee,et al.  Universal attribute characterization of spoken languages for automatic spoken language recognition , 2013, Comput. Speech Lang..