Study and Implementation of Monolingual Approach on Indonesian Question Answering for Factoid and Non-Factoid Question

We developed an open domain QA system that can handle factoid and nonfactoid questions in Indonesian language by using monolingual approaches. EAT classification is done by identifying question word and clue words. Keyword extraction from question is done by looking at POS information of each word in question, eliminating stop words, and stemming. We use articles from Indonesian Wikipedia as corpus and Lucene framework as the base for passage retriever component, with three additional processing: query expansion, boost EAT, and boost term. For factoid questions, answer finding is done by using Named Entity Recognition. Answer scoring is done by calculating keyword occurrences and answer-keywords distance (MRR = 0.6191). For non-factoid questions, answer finding is done by identifying sentence pattern and clue words. Answer scoring is done by considering pattern priority and keyword occurrences (MRR = 0.8079).