A semantic-based question answering system for indonesian translation of Quran

This paper presents a work in developing a semantic-based question answering system (QAS) for Indonesian Translation of Quran (ITQ). This research is motivated by the lacks of previous built QAS that caused by a keyword-based retrieval. Instead of keeping the retrieval method, we shifted to a semantic approach where the retrieval process is done by using a semantic similarity measurement. In doing so, we built an ontology of ITQ to get the concepts as well as verses where they appear in. We applied three factoid question types on the QAS that including Who, Where, and When. Furthermore, a weighted vector for each concept that belongs to respective expected answering type (also called as named entity group) i.e. Person, Location, and Time is generated in order to feed semantic interpreter on user question. From 222 concepts defined from the ontology, we clustered them into 77, 24, and 6 concepts for Person, Location, and Time respectively. Since we found there are some characteristics of texts in ITQ, we developed our own modules to deal with including generate the inverted index and named entity recognition. Answer extraction is conducted by applying some features extraction in order to score the answer candidates. Evaluation of the system is designed by providing two data set of question and answer where the first one is purposed to measure the effectiveness of semantic approach comparing with keyword-based retrieval and the last one aims to know system performance in regard the appearance of concepts in ITQ.