A Forward-Selection Algorithm for SVM-Based Question Classification in Cognitive Systems

Cognitive Systems have attracted attention in last years, especially regarding high interactivity of Question Answering systems. In this context, Question Classification plays an important role for individuation of answer type. It involves the use of Natural Language Processing of the question, the extraction of a broad variety of features, and the use of machine learning algorithms to map features with a given taxonomy of question classes. In this work, a novel learning approach is proposed, based on the use of Support Vector Machines, for building a set of classifiers, each one to use for different questions and comprising the respective features, chosen through a particular forward-selection procedure. This approach aims at decreasing the total number of features, by avoiding those giving scarce information and/or noise. A Question Classification framework is implemented, comprising new sets of features with low numerosity. The application on a benchmark dataset shows classification accuracy competitive with the state-of-the-art, by considering a lower number of features.

[1]  Giuseppe De Pietro,et al.  Extracting Compact Sets of Features for Question Classification in Cognitive Systems: A Comparative Study , 2015, 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC).

[2]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[3]  Robert Dale,et al.  Classical Approaches to Natural Language Processing , 2010, Handbook of Natural Language Processing.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Luísa Coheur,et al.  From symbolic to sub-symbolic information in question classification , 2011, Artificial Intelligence Review.

[6]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[7]  Zengchang Qin,et al.  Question Classification using Head Words and their Hypernyms , 2008, EMNLP.

[8]  Kalina Bontcheva,et al.  Text Processing with GATE , 2011 .

[9]  Anwar Ali Yahya,et al.  Automatic Classification of Questions into Bloom's Cognitive Levels Using Support Vector Machines , 2011 .

[10]  Megha Mishra,et al.  Question Classification using Semantic, Syntactic and Lexical features , 2013 .

[11]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[12]  Babak Loni Enhanced Question Classification with Optimal Combination of Features , 2011 .

[13]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[14]  Sanda M. Harabagiu,et al.  LASSO: A Tool for Surfing the Answer Net , 1999, TREC.

[15]  N. Omar,et al.  A rule-based approach in Bloom's Taxonomy question classification through natural language processing , 2012, 2012 7th International Conference on Computing and Convergence Technology (ICCCT).

[16]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[17]  Xian Zhang,et al.  Classifyingwhat-typequestions byhead nountagging , 2008, COLING 2008.

[18]  Babak Loni,et al.  A Survey of State-of-the-Art Methods on Question Classification , 2011 .

[19]  W. Bruce Croft,et al.  Analysis of Statistical Question Classification for Fact-Based Questions , 2005, Information Retrieval.

[20]  Pascal Wiggers,et al.  Question Classification by Weighted Combination of Lexical, Syntactic and Semantic Features , 2011, TSD.

[21]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[22]  Eduard Hovy,et al.  A question/answer typology with surface text patterns , 2002 .

[23]  Xian Zhang,et al.  Classifying What-Type Questions by Head Noun Tagging , 2008, COLING.

[24]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.