Boosting Chinese Question Answering with Two Lightweight Methods: ABSPs and SCO-QAT

Question Answering (QA) research has been conducted in many languages. Nearly all the top performing systems use heavy methods that require sophisticated techniques, such as parsers or logic provers. However, such techniques are usually unavailable or unaffordable for under-resourced languages or in resource-limited situations. In this article, we describe how a top-performing Chinese QA system can be designed by using lightweight methods effectively. We propose two lightweight methods, namely the Sum of Co-occurrences of Question and Answer Terms (SCO-QAT) and Alignment-based Surface Patterns (ABSPs). SCO-QAT is a co-occurrence-based answer-ranking method that does not need extra knowledge, word-ignoring heuristic rules, or tools. It calculates co-occurrence scores based on the passage retrieval results. ABSPs are syntactic patterns trained from question-answer pairs with a multiple alignment algorithm. They are used to capture the relations between terms and then use the relations to filter answers. We attribute the success of the ABSPs and SCO-QAT methods to the effective use of local syntactic information and global co-occurrence information. By using SCO-QAT and ABSPs, we improved the RU-Accuracy of our testbed QA system, ASQA, from 0.445 to 0.535 on the NTCIR-5 dataset. It also achieved the top 0.5 RU-Accuracy on the NTCIR-6 dataset. The result shows that lightweight methods are not only cheaper to implement, but also have the potential to achieve state-of-the-art performances.

[1]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[2]  Ion Muslea,et al.  Extraction Patterns for Information Extraction Tasks: A Survey , 1999 .

[3]  Diego Mollá Aliod,et al.  Answerfinder: Question Answering by Combining Lexical, Syntactic and Semantic Information , 2004, ALTA.

[4]  Bernardo Magnini,et al.  Is It the Right Answer? Exploiting Web Redundancy for Answer Validation , 2002, ACL.

[5]  K. Minton Extraction Patterns for Information Extraction Tasks : A Survey , 1999 .

[6]  Charles L. A. Clarke,et al.  Statistical Selection of Exact Answers (MultiText Experiments for TREC 2002) , 2002, TREC.

[7]  Dominique Laurent,et al.  Cross Lingual Question Answering using QRISTAL for CLEF 2008 , 2006, CLEF.

[8]  Shih-Hung Wu,et al.  ASQA: Academia Sinica Question Answering System for NTCIR-5 CLQA , 2005, NTCIR.

[9]  Teruko Mitamura,et al.  CMU JAVELIN System for NTCIR5 CLQA1 , 2005, NTCIR.

[10]  Zhiping Zheng,et al.  AnswerBus question answering system , 2002 .

[11]  Martin M. Soubbotin Patterns of Potential Answer Expressions as Clues to the Right Answers , 2001, TREC.

[12]  Steffen Staab,et al.  Engineering Ontologies using Semantic Patterns , 2001, OIS@IJCAI.

[13]  Kui-Lam Kwok,et al.  Chinese Question-Answering: Comparing Monolingual with English-Chinese Cross-Lingual Results , 2006, AIRS.

[14]  Shih-Hung Wu,et al.  Event identification based on the information map-INFOMAP , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[15]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[16]  Sanda M. Harabagiu,et al.  Employing Two Question Answering Systems in TREC 2005 , 2005, TREC.

[17]  Manuel Palomar,et al.  Semantic pattern learning through maximum entropy-based WSD technique , 2001, CoNLL.

[18]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from full texts , 2004, Bioinform..

[19]  Hsin-Hsi Chen,et al.  Overview of the NTCIR-5 Cross-Lingual Question Answering Task (CLQA1) , 2005, NTCIR.

[20]  Wen-Hsiang Lu,et al.  Improving Answer Ranking Using Cohesion between Answer and Keywords , 2005, NTCIR.

[21]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[22]  N. H. Beebe A Complete Bibliography of ACM Transactions on Asian Language Information Processing , 2007 .

[23]  Wen-Lian Hsu,et al.  On Using Ensemble Methods for Chinese Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[24]  Wen-Lian Hsu,et al.  Chinese-Chinese and English-Chinese Question Answering with ASQA at NTCIR-6 CLQA , 2007, NTCIR.

[25]  Jimmy J. Lin Evaluation of resources for question answering evaluation , 2005, SIGIR '05.

[26]  Yuji Matsumoto,et al.  NAIST QA System for QAC2 , 2004, NTCIR.

[27]  Shih-Hung Wu,et al.  An integrated knowledge-based and machine learning approach for Chinese question classification , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[28]  Stefan M. Rüger,et al.  A Simple Question Answering System , 2000, TREC.

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Gosse Bouma Reasoning over Dependency Relations for QA , 2005 .

[31]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[32]  Peng Li,et al.  Insun05QA on QA Track of TREC 2005 , 2005, TREC.

[33]  WuChia-Wei,et al.  Boosting Chinese Question Answering with Two Lightweight Methods , 2008 .

[34]  Charles L. A. Clarke,et al.  Exploiting redundancy in question answering , 2001, SIGIR '01.