Speech Command Classification System for Sinhala Language based on Automatic Speech Recognition

Conversational Artificial Intelligence is revolutionizing the world with its power of converting the conventional computer to a human-like-computer. Exploiting the speaker’s intention is one of the major aspects in the field of conversational Artificial Intelligence. A significant challenge that hinders the effectiveness of identifying the speaker’s intention is the lack of language resources. To address this issue, we present a domain-specific speech command classification system for Sinhala, a low-resourced language. It accomplishes intent detection for the spoken Sinhala language using Automatic Speech Recognition and Natural Language Understanding. The proposed system can be effectively utilized in value-added applications such as Sinhala speech dialog systems. The system consists of an Automatic Speech Recognition engine to convert continuous natural human voice in Sinhala language to its textual representation and a text classifier to accurately understand the user intention. We also present a novel dataset for this task, 4.15 hours of Sinhala speech corpus in the banking domain. Our new Sinhala speech command classification system provides an accuracy of 89.7% in predicting the intent of an utterance. It outperforms the state-of-the-art direct speech-to-intent classification systems developed for the Sinhala language. Moreover, the Automatic Speech Recognition engine shows the Word Error Rate as 12.04% and the Sentence Error Rate as 21.56%. In addition, our experiments provide useful insights on speech-to-intent classification to researchers in low resource spoken language understanding.

[1]  Sanath Jayasena,et al.  Domain Specific Intent Classification of Sinhala Speech Data , 2018, 2018 International Conference on Asian Language Processing (IALP).

[2]  Sanath Jayasena,et al.  Linguistic Divergence of Sinhala and Tamil Languages in Machine Translation , 2018, 2018 International Conference on Asian Language Processing (IALP).

[3]  Yash Ahuja,et al.  Multiclass Classification and Support Vector Machine By Yashima Ahuja & , 2012 .

[4]  Divya Gupta,et al.  An analysis on LPC, RASTA and MFCC techniques in Automatic Speech recognition system , 2016, 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence).

[5]  P. K. Sahu,et al.  Automatic speech recognition based Odia system , 2015, 2015 International Conference on Microwave, Optical and Communication Engineering (ICMOCE).

[6]  Kuldeep Kumar,et al.  A Hindi speech recognition system for connected words using HTK , 2012 .

[7]  Marc A. Zissman,et al.  Automatic language identification using Gaussian mixture and hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Jimmy J. Lin,et al.  Multi-Task Learning with Neural Networks for Voice Query Understanding on an Entertainment Platform , 2018, KDD.

[9]  Dileeka Dias,et al.  Sinhala Speech Recognition for Interactive Voice Response Systems Accessed Through Mobile Phones , 2018, 2018 Moratuwa Engineering Research Conference (MERCon).

[10]  Ruvan Weerasinghe,et al.  Continuous Sinhala Speech Recognizer , 2011 .

[11]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[12]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[13]  Thesis Report Implementation of Speech Recognition System for Bangla , 2010 .

[14]  Dong Yu,et al.  An Integrative and Discriminative Technique for Spoken Utterance Classification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  D. D. A. Gamini,et al.  Speaker independent Sinhala speech recognition for voice dialling , 2012, International Conference on Advances in ICT for Emerging Regions (ICTer2012).

[16]  Jason Weston,et al.  StarSpace: Embed All The Things! , 2017, AAAI.

[17]  Surangika Ranathunga,et al.  Transfer Learning Based Free-Form Speech Command Classification for Low-Resource Languages , 2019, ACL.

[18]  J. D. Nallathamby,et al.  deBas: a sinhala Interactive voice response (IVR) system , 2011 .

[19]  Rolly Maulana Awangga,et al.  Comparison Of Multinomial Naive Bayes Algorithm And Logistic Regression For Intent Classification In Chatbot , 2018, 2018 International Conference on Applied Engineering (ICAE).

[20]  Steve Young,et al.  The HTK book , 1995 .

[21]  Mohammad Nurul Huda,et al.  Automatic word recognition for bangla spoken language , 2014, 2014 International Conference on Signal Propagation and Computer Technology (ICSPCT 2014).

[22]  Sanath Jayasena,et al.  Voicer: A Crowd Sourcing Tool for Speech Data Collection , 2018, 2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer).