Learning Based Approaches for Vietnamese Question Classification Using Keywords Extraction from the Web

This paper presents our research on automatic question classification for Vietnamese using machine learning approaches. We have experimented with several machine learning algorithms utilizing two kinds of feature groups: bag-of-words and keywords. Our research focuses on two most important tasks which are corpus building and features extraction by crawling data from the Web to build a keyword corpus. The performance of our approach is promising where our system’s precision outperforms the state-of-the-art Tree Kernel approach (Collins and Duffy, 2001) on a Vietnamese question corpus.

[1]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[2]  Dan Roth,et al.  Learning question classifiers: the role of semantic information , 2005, Natural Language Engineering.

[3]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[4]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[5]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[6]  Dai Quoc Nguyen,et al.  A Vietnamese Question Answering System , 2009, 2009 International Conference on Knowledge and Systems Engineering.

[7]  B. Parhami Voting algorithms , 1994 .

[8]  Phuong-Thai Nguyen,et al.  An Experimental Study on Lexicalized Statistical Parsing for Vietnamese , 2009, 2009 International Conference on Knowledge and Systems Engineering.

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Oanh Thi Tran,et al.  An Experimental Study of Vietnamese Question Answering System , 2009, 2009 International Conference on Asian Language Processing.

[11]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[12]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[13]  Ellen M. Voorhees,et al.  Overview of the TREC-9 Question Answering Track , 2000, TREC.

[14]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[15]  Eduard H. Hovy,et al.  Toward Semantics-Based Answer Pinpointing , 2001, HLT.

[16]  Alessandro Moschitti,et al.  A Study on Convolution Kernels for Shallow Statistic Parsing , 2004, ACL.

[17]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.