Applying question classification to Yahoo! Answers

Question classification is an important part in modern Question Answering systems. Most approaches to question classification are based on handcrafted rules. Recent studies classify simple questions using machine learning techniques and recommends SVM as on of the best performing classifiers. This study applies a hierarchical classifier based on the SVM machine learning algorithm on questions posed by users, drawn from Yahoo! Answers. The significance of this study is that we attempted to directly classify complex questions with multiple sentence questions posed by real users. We report the accuracy achieved using both a coarse-grained classifier and fine-grained classifier to illustrate the effectiveness of our approach on complex questions. We also present a confusion matrix to analyze the results made by our classifier.

[1]  Tsutomu Hirao,et al.  NTT's QA Systems for NTCIR QAC-1 , 2002, NTCIR.

[2]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[3]  Qiang Yang,et al.  Q2C@UST: our winning solution to query classification in KDDCUP 2005 , 2005, SKDD.

[4]  Jie Tang,et al.  Link Prediction of Social Networks Based on Weighted Proximity Measures , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[5]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[6]  Alton Yeow-Kuan Chua,et al.  A predictive framework for retrieving the best answer , 2008, SAC '08.

[7]  Jun Suzuki,et al.  Question Classification using HDAG Kernel , 2003, ACL 2003.

[8]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[9]  Manuel Montes-y-Gómez,et al.  A Language Independent Method for Question Classification , 2004, COLING.

[10]  Ophir Frieder,et al.  Improving automatic query classification via semi-supervised learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[11]  Alton Yeow-Kuan Chua,et al.  Towards a Hierarchical Framework for Predicting the Best Answer in a Question Answering System , 2007, ICADL.

[13]  Ingrid Zukerman,et al.  Analyzing the Effect of Query Class on Document Retrieval Performance , 2004, Australian Conference on Artificial Intelligence.

[14]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[15]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[16]  Thamar Solorio,et al.  Learning Named Entity Classifiers Using Support Vector Machines , 2004, CICLing.

[17]  James R. Curran,et al.  Question classification with log-linear models , 2006, SIGIR.

[18]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[19]  Akihiro Tamura,et al.  Classification of Multiple-Sentence Questions , 2005, IJCNLP.

[20]  Dunja Mladenic,et al.  Word sequences as features in text-learning , 1998 .

[21]  Robert L. Grossman,et al.  KDD-2005 : proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 21-24, 2005, Chicago, Illinois, USA , 2005 .

[22]  Qiang Yang,et al.  Query enrichment for web-query classification , 2006, TOIS.

[23]  Sanda M. Harabagiu,et al.  Performance issues and error analysis in an open-domain question answering system , 2003, TOIS.