论文信息 - Statistical-based Approach for Indonesian Complex Factoid Question Decomposition

Statistical-based Approach for Indonesian Complex Factoid Question Decomposition

This research has proposed a method to decompose complex factoid question into several independent questions. The method comprises four stages: (1) classifying input question into several categories such as sub-question, coordination, exemplification, or double question, (2) generating all possible question boundary candidates, (3) selecting the best question boundary, and (4) performing the question decomposition rule using the best question boundary. This study compared several machine learning algorithms in the first stage (complex factoid question classification) and third stage (question decomposition boundary selection). The features used in the classification are specific word lists with its related information including the syntactic features of POS (Part of Speech) tag. For the experiments, we annotated 916 sentences for training data and 226 sentences for testing data. The perplexity of the annotated corpus achieved 1.000586 with 307 Out of Vocabulary (OOV). The complex factoid question classification accuracy reached 93.8% with Random Forest algorithm. The question decomposition boundary selection accuracy achieved 93.80% for sub-question (using Random Forest algorithm), 86.11% for double question (using Random Forest algorithm), 88.23% for coordination (using SMO), and 60.87% for exemplification (using kNN, NB, and RF). A revision rule was provided for the question decomposition boundary selection that improved the accuracy into 97.22% for double question, 94.11% for coordination, and 65.21% for exemplification.

Ayu Purwarianti | Setio Basuki

[1] Sanda M. Harabagiu,et al. Impact of Question Decomposition on the Quality of Answer Summaries , 2006, LREC.

[2] Sanda M. Harabagiu,et al. Answering complex questions with random walk models , 2006, SIGIR '06.

[3] Halil Kilicoglu,et al. Decomposing Consumer Health Questions , 2014, BioNLP@ACL.

[4] Ayu Purwarianti,et al. A Machine Learning Approach for an Indonesian-English Cross Language Question Answering System , 2007, IEICE Trans. Inf. Syst..

[5] Ayu Purwarianti,et al. Study and Implementation of Monolingual Approach on Indonesian Question Answering for Factoid and Non-Factoid Question , 2011, PACLIC.

[6] Siddharth Patwardhan,et al. Fact-based question decomposition in DeepQA , 2012, IBM J. Res. Dev..

[7] Halil Kilicoglu,et al. Annotating Question Decomposition on Complex Medical Questions , 2014, LREC.

[8] R. DewiAgushinta,et al. Web Based Virtual Agent for Tourism Guide in Indonesia , 2011, ACC.

[9] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[10] Gerard Salton,et al. Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[11] Kohei Arai,et al. Question Answering System for an Effective Collaborative Learning , 2012 .

[12] Ayu Purwarianti,et al. A machine learning approach for indonesian question answering system , 2007, Artificial Intelligence and Applications.

[13] Jeff Errington. Simple or complex , 2000 .