Image-based features for speech signal classification

Like other applications, under the purview of pattern classification, analyzing speech signals is crucial. People often mix different languages while talking which makes this task complicated. This happens mostly in India, since different languages are used from one state to another. Among many, Southern part of India suffers a lot from this situation, where distinguishing their languages is important. In this paper, we propose image-based features for speech signal classification because it is possible to identify different patterns by visualizing their speech patterns. Modified Mel frequency cepstral coefficient (MFCC) features namely MFCC- Statistics Grade (MFCC-SG) were extracted which were visualized by plotting techniques and thereafter fed to a convolutional neural network. In this study, we used the top 4 languages namely Telugu, Tamil, Malayalam, and Kannada. Experiments were performed on more than 900 hours of data collected from YouTube leading to over 150000 images and the highest accuracy of 94.51% was obtained.

[1]  Vennila Ramalingam,et al.  A hierarchical language identification system for Indian languages , 2012, Digit. Signal Process..

[2]  Nibaran Das,et al.  Deep learning for word-level handwritten Indic script identification , 2018, RTIP2R.

[3]  Teddy Surya Gunawan,et al.  Development of language identification system using MFCC and vector quantization , 2017, 2017 IEEE 4th International Conference on Smart Instrumentation, Measurement and Application (ICSIMA).

[4]  Shubha Kadambe,et al.  Language identification with phonological and lexical models , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Santanu Phadikar,et al.  Lazy Learning Based Segregation of Top-3 South Indian Languages with LSF-A Feature , 2018, RTIP2R.

[7]  Liqiang Zhang,et al.  An Improved LSTM For Language Identification , 2018, 2018 14th IEEE International Conference on Signal Processing (ICSP).

[8]  Shweta Bansal,et al.  Modeling of linguistic and acoustic information from speech signal for multilingual spoken language identification system (SLID) , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).

[9]  Walid Mahdi,et al.  Improving of Open-Set Language Identification by Using Deep SVM and Thresholding Functions , 2017, 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA).

[10]  Nibaran Das,et al.  Improved word-level handwritten Indic script identification by integrating small convolutional neural networks , 2019, Neural Computing and Applications.

[11]  Nibaran Das,et al.  Extreme learning machine for handwritten Indic script identification in multiscript documents , 2018, J. Electronic Imaging.

[12]  Santanu Phadikar,et al.  Linear Predictive Coefficients-Based Feature to Identify Top-Seven Spoken Languages , 2020, Int. J. Pattern Recognit. Artif. Intell..

[13]  Dong Wang,et al.  Phonetic Temporal Neural Model for Language Identification , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Santanu Phadikar,et al.  RECAL — A language identification system , 2017, 2017 International Conference on Signal Processing and Communication (ICSPC).

[15]  Haizhou Li,et al.  Language Identification: A Tutorial , 2011, IEEE Circuits and Systems Magazine.

[16]  V. Ramu Reddy,et al.  Identification of Indian languages using multi-level spectral and prosodic features , 2013, International Journal of Speech Technology.

[17]  Ian McLoughlin,et al.  LID-Senones and Their Statistics for Language Identification , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  C. Jeyalakshmi,et al.  Comparative analysis on the use of features and models for validating language identification system , 2017, 2017 International Conference on Inventive Computing and Informatics (ICICI).

[19]  Santanu Phadikar,et al.  Identification of top-3 spoken Indian languages: An Ensemble learning-based approach , 2018, 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN).

[20]  Thomas Niesler,et al.  Language identification and multilingual speech recognition using discriminatively trained acoustic models , 2006 .

[21]  Antanas Verikas,et al.  Agreeing to disagree: active learning with noisy labels without crowdsourcing , 2017, International Journal of Machine Learning and Cybernetics.

[22]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[23]  Santanu Phadikar,et al.  Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal , 2018, Int. J. Speech Technol..

[24]  Szilárd Vajda,et al.  A Fast k-Nearest Neighbor Classifier Using Unsupervised Clustering , 2016, RTIP2R.

[25]  Christoph Meinel,et al.  Language Identification Using Deep Convolutional Recurrent Neural Networks , 2017, ICONIP.

[26]  S. S. Poorna,et al.  Language Identification From Speech Features Using SVM and LDA , 2018, 2018 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET).

[27]  Marelie H. Davel,et al.  The effect of language identification accuracy on speech recognition accuracy of proper names , 2017, 2017 Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech).

[28]  Shambhu Shankar Bharti,et al.  Implicit language identification system based on random forest and support vector machine for speech , 2017, 2017 4th International Conference on Power, Control & Embedded Systems (ICPCES).

[29]  V. Ramu Reddy,et al.  Pitch synchronous and glottal closure based speech analysis for language recognition , 2013, Int. J. Speech Technol..

[30]  Koj Sambyo,et al.  Automatic Identification of Arunachal language Using K-Nearest Neighbor Algorithm , 2018, 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN).

[31]  Pao-Chi Chang,et al.  Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition , 2016, Multimedia Tools and Applications.

[32]  Marek R. Ogiela,et al.  Multimedia tools and applications , 2005, Multimedia Tools and Applications.