Recognition of Spoken Languages from Acoustic Speech Signals Using Fourier Parameters

Spoken language identification (LID) or spoken language recognition (LR) is defined as the process of recognizing the language from speech utterance. In this paper, a new Fourier parameter (FP) model is proposed for the task of speaker-independent spoken language recognition. The performance of the proposed FP features is analyzed and compared with the legacy mel-frequency cepstral coefficient (MFCC) features. Two multilingual databases, namely Indian Institute of Technology Kharagpur Multilingual Indian Language Speech Corpus (IITKGP-MLILSC) and Oriental Language Recognition Speech Corpus (AP18-OLR), are used to extract FP and MFCC features. Spoken LID/LR models are developed with the extracted FP and MFCC features using three classifiers, namely support vector machines, feed-forward artificial neural networks, and deep neural networks. Experimental results show that the proposed FP features can effectively recognize different languages from speech signals. It can also be observed that the recognition performance is significantly improved when compared to MFCC features. Further, the recognition performance is enhanced when MFCC and FP features are combined.

[1]  Hema A. Murthy,et al.  Language identification from short segments of speech , 2000, INTERSPEECH.

[2]  K. Sreenivasa Rao,et al.  Improvement of Phone Recognition Accuracy Using Articulatory Features , 2017, Circuits, Systems, and Signal Processing.

[3]  K. Sreenivasa Rao,et al.  Language identification using Hilbert envelope and phase information of linear prediction residual , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[4]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[5]  Sarmad Hussain,et al.  Acoustic Feature Analysis and Discriminative Modeling for Language Identification of Closely Related South-Asian Languages , 2018, Circuits Syst. Signal Process..

[6]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[7]  Martin T. Hagan,et al.  Neural network design , 1995 .

[8]  Qing Chen,et al.  AP18-OLR Challenge: Three Tasks and Their Baselines , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[9]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[10]  Aniruddha Kanhe,et al.  Speaker-Independent Japanese Isolated Speech Word Recognition Using TDRC Features , 2018, 2018 International CET Conference on Control, Communication, and Computing (IC4).

[11]  S. Maity,et al.  IITKGP-MLILSC speech database for language identification , 2012, 2012 National Conference on Communications (NCC).

[12]  C. A. Murthy Bridging Feature Selection and Extraction: Compound Feature Generation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[13]  Dong Wang,et al.  AP17-OLR challenge: Data, plan, and baseline , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[14]  Aniruddha Kanhe,et al.  Performance Comparison of Different Cepstral Features for Speech Emotion Recognition , 2018, 2018 International CET Conference on Control, Communication, and Computing (IC4).

[15]  Suryakanth V. Gangashetty,et al.  Combining evidences from excitation source and vocal tract system features for Indian language identification using deep neural networks , 2018, Int. J. Speech Technol..

[16]  V. Kecman,et al.  Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance , 2005 .

[17]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[18]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[19]  Dong Wang,et al.  AP16-OL7: A multilingual database for oriental languages and a language recognition baseline , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[20]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[21]  Suryakanth V. Gangashetty,et al.  An Investigation of Deep Neural Network Architectures for Language Recognition in Indian Languages , 2016, INTERSPEECH.

[22]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[23]  Vennila Ramalingam,et al.  A hierarchical language identification system for Indian languages , 2012, Digit. Signal Process..

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[26]  Herbert Gish,et al.  Discriminatively Trained GMMs for Language Classification Using Boosting Methods , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  B. Yegnanarayana,et al.  Neural network classifiers for language identification using phonotactic and prosodic features , 2005, Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005..

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[29]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[30]  V. Ramu Reddy,et al.  Identification of Indian languages using multi-level spectral and prosodic features , 2013, International Journal of Speech Technology.

[31]  K. Sreenivasa Rao,et al.  Application of prosody models for developing speech systems in Indian languages , 2011, Int. J. Speech Technol..

[32]  Shashidhar G. Koolagudi,et al.  Dravidian language classification from speech signal using spectral and prosodic features , 2017, International Journal of Speech Technology.

[33]  Haizhou Li,et al.  Language Identification: A Tutorial , 2011, IEEE Circuits and Systems Magazine.

[34]  Joaquín González-Rodríguez,et al.  Automatic language identification using long short-term memory recurrent neural networks , 2014, INTERSPEECH.

[35]  Douglas A. Reynolds,et al.  Deep Neural Network Approaches to Speaker and Language Recognition , 2015, IEEE Signal Processing Letters.

[36]  K. Gnana Sheela,et al.  Review on Methods to Fix Number of Hidden Neurons in Neural Networks , 2013 .

[37]  K. Sreenivasa Rao,et al.  Robust Speaker Recognition in Noisy Environments , 2014 .

[38]  Bin Ma,et al.  Spoken Language Recognition: From Fundamentals to Practice , 2013, Proceedings of the IEEE.

[39]  Rabul Hussain Laskar,et al.  A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features , 2018, Circuits, Systems, and Signal Processing.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Carlos Busso,et al.  Iterative Feature Normalization Scheme for Automatic Emotion Detection from Speech , 2013, IEEE Transactions on Affective Computing.

[42]  Habibollah Haron,et al.  Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  Hari Krishna Vydana,et al.  IIITH-ILSC Speech Database for Indain Language Identification , 2018, SLTU.