Assessing Short Answers in Indonesian Using Semantic Text Similarity Method and Dynamic Corpus

Automatic assessment of short answers is one of the Computer Assisted Test works that can assess answers in natural language. Several methods have been used to create a system capable of assessing short answers that are close to human markings. In Indonesian, it might be easy to use string-based similarity methods by matching keywords, as has been done in previous studies. However, short answers have characteristics that focus on content, question type, and answer length, which cannot be accommodated only by string-based methods. This study aims to implement a hybrid method using corpus and string-based similarities. The Semantic Text Similarity (STS) method was used in this study to assess short answers in Indonesian. The STS method consists of three combinations of similarity methods, namely Normalized and Modified Longest Common Subsequence, Second Order Co-occurrence Pointwise Mutual Information, and Common Word Order Similarity. We also use a dynamic corpus with the advantage of being relatively small and adaptable to the learning domain. The Gensim Module is used to generate a dynamic corpus. The dynamic corpus uses the top five answers from students obtained from the Gensim module. The STS method is compared with the Cosine Similarity method since Cosine Similarity is the most commonly used method to assess answers in Indonesian. The results show that the STS method can outperform the Cosine Similarity method based on the Mean Absolute Error value, but still not outperformed in terms of correlation.