Indonesian Automatic Speech Recognition system using CMUSphinx toolkit and limited dataset

Building Automatic Speech Recognition (ASR) needs acoustic model, language model and dictionary for intended language, which is also applied for Indonesian ASR. In this paper, Indonesian ASR was built using CMUSphinx toolkit (a Hidden Markov Model based ASR tool) with limited dataset. We use digit corpus and own made language model to trained with the limited dataset. We also investigated the implementation of trained acoustic model by examine it in different SNR condition to several people. The best achievement of word error accuracy of the acoustic model is 86% on average. By examine it in different SNR condition, we got maximum accuracy of 80% on 27.764 dB environment.

[1]  Patrice Alexandre,et al.  Root cepstral analysis: A unified view. Application to speech processing in car noise environments , 1993, Speech Commun..

[2]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[4]  Konstantin Markov,et al.  Recent Developments in the Russian Speech Recognition Technology , 2010, 2010 IEEE/ACIS 9th International Conference on Computer and Information Science.

[5]  Djoerd Hiemstra Language Models , 2009, Encyclopedia of Database Systems.

[6]  Ayu Purwarianti,et al.  Indonesian Automatic Speech Recognition System Using English-Based Acoustic Model , 2012 .

[7]  Hassan Satori,et al.  Investigation arabic speech recognition using CMU sphinx system , 2009, Int. Arab J. Inf. Technol..

[8]  Dessi Puji Lestari,et al.  A Large Vocabulary Continuous Speech Recognition System for Indonesian Language , 2006 .

[9]  V. K. Bhadran,et al.  Malayalam Speech Recognition system and its application for visually impaired people , 2012, 2012 Annual IEEE India Conference (INDICON).

[10]  Veronique Stouten,et al.  Robust Automatic Speech Recognition in Time-Varying Environments (Robuuste automatische spraakherkenning in een tijdsvariërende omgeving) , 2006 .

[11]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[12]  S. T. M.Si. Afan Galih Salman,et al.  SPEECH RECOGNITION BAHASA INDONESIA UNTUK ANDROID , 2013 .

[13]  Muhirwe Jackson AUTOMATIC SPEECH RECOGNITION: HUMAN COMPUTER INTERFACE FOR KINYARWANDA LANGUAGE , 2005 .

[14]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[15]  Bruce Lowerre,et al.  The Harpy speech understanding system , 1990 .

[16]  MokhammadHilman Fatah Implementasi Library PocketSphinx Untuk Pengenalan Voice Command Berbahasa Indonesia Secara Offline. , 2015 .

[17]  R. Chitturi,et al.  Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems , 2005 .