Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words "atas" (up), "bawah" (down), "kanan" (right), and "kiri" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word "atas", 86% for the word "bawah", 81% for the word "kanan", and 100% for the word "kiri". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the "atas" class, 54% for the "bawah" class, 44% for the "kanan" class, and 100% for the "kiri" class.
[1]
Dhany Arifianto,et al.
Development of under-resourced Bahasa Indonesia speech corpus
,
2017,
2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[2]
E. B. Newman,et al.
A Scale for the Measurement of the Psychological Magnitude Pitch
,
1937
.
[3]
Robert Tibshirani,et al.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition
,
2001,
Springer Series in Statistics.
[4]
Kristin Precoda.
Non-Mainstream Languages and Speech Recognition: Some Challenges
,
2013
.