Question Classification using Semantic, Syntactic and Lexical features

Question classification is very important for question answering. This paper present our research work on question classification through machine learning approach. In order to train th e learning model, we designed a rich set of features that are predictive of question categories. An important component of question answering systems is question classification. The task of question classification is to predict the entity type of the answer of a natural language question. Question classification is typically done using machine learning techniques. Different lexical, syntactical and semantic features can be extracted from a question. In this work we combined lexical, syntactic and semantic f eatures which improve the accuracy of classification. Furthermore, we adopted three different classifiers: Nearest Neighbors (NN), Naive Bayes (NB), and Support Vector Machines (SVM) using two kinds of features: bag -of-words and bag-of n grams. Furthermore, we discovered that when we take SVM classifier and combine the semantic, syntactic, lexical feature we found that it will improve the accuracy of classification. We tested our proposed approaches on the well-known UIUC dataset and succeeded to achieve anew record on the accuracy of classification on this dataset.

[1]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[2]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[3]  Sang-Won Lee,et al.  An efficient inverted index technique for XML documents using RDBMS , 2003, Inf. Softw. Technol..

[4]  Jimmy J. Lin,et al.  Complex question answering based on a semantic domain model of clinical medicine , 2006 .

[5]  Sanda M. Harabagiu,et al.  The Structure and Performance of an Open-Domain Question Answering System , 2000, ACL.

[6]  Pascal Wiggers,et al.  Question Classification by Weighted Combination of Lexical, Syntactic and Semantic Features , 2011, TSD.

[7]  Ingo Glöckner,et al.  The LogAnswer Project at CLEF 2009 , 2009, CLEF.

[8]  William Tunstall-Pedoe,et al.  True Knowledge: Open-Domain Question Answering Using Structured Knowledge and Inference , 2010, AI Mag..

[9]  Chih-Jen Lin,et al.  A Comparison of Methods for Multi-class Support Vector Machines , 2015 .

[10]  Mohammad Reza Kangavari,et al.  Information Retrieval: Improving Question Answering Systems by Query Reformulation and Answer Validation , 2008 .

[11]  Ulrich Furbach,et al.  An application of automated reasoning in natural language question answering , 2010, AI Commun..

[12]  Zengchang Qin,et al.  Question Classification using Head Words and their Hypernyms , 2008, EMNLP.

[13]  Zheng Wei,et al.  Formalized answer extraction technology based on pattern learning , 2010, International Forum on Strategic Technology 2010.

[14]  Zhang Lijun,et al.  Research and application of information retrieval techniques in Intelligent Question Answering System , 2011, 2011 3rd International Conference on Computer Research and Development.

[15]  Estela Saquete Boró,et al.  Combining semantic information in question answering systems , 2011, Inf. Process. Manag..

[16]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[17]  Adán Cassan,et al.  Priberam's Question Answering System in a Cross-Language Environment , 2006, CLEF.

[18]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[20]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[21]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[22]  James R. Curran,et al.  Question classification with log-linear models , 2006, SIGIR.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.