Automatic categorization of questions for user-interactive question answering

Question categorization, which suggests one of a set of predefined categories to a user's question according to the question's topic or content, is a useful technique in user-interactive question answering systems. In this paper, we propose an automatic method for question categorization in a user-interactive question answering system. This method includes four steps: feature space construction, topic-wise words identification and weighting, semantic mapping, and similarity calculation. We firstly construct the feature space based on all accumulated questions and calculate the feature vector of each predefined category which contains certain accumulated questions. When a new question is posted, the semantic pattern of the question is used to identify and weigh the important words of the question. After that, the question is semantically mapped into the constructed feature space to enrich its representation. Finally, the similarity between the question and each category is calculated based on their feature vectors. The category with the highest similarity is assigned to the question. The experimental results show that our proposed method achieves good categorization precision and outperforms the traditional categorization methods on the selected test questions.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[3]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[4]  Stefan Wermter,et al.  Selforganizing Classification on the Reuters News Corpus , 2002, COLING.

[5]  Fakhri Karray,et al.  Enhancing Text Clustering Using Concept-based Mining Model , 2006, Sixth International Conference on Data Mining (ICDM'06).

[6]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[7]  Naoaki Okazaki,et al.  Sentence Extraction by Spreading Activation with Refined Similarity Measure , 2003, FLAIRS Conference.

[8]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[9]  Fakhri Karray,et al.  A concept-based model for enhancing text categorization , 2007, KDD '07.

[10]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[11]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[12]  Min Feng,et al.  A Web-Based Platform for User-Interactive Question-Answering , 2008, World Wide Web.

[13]  Fakhri Karray,et al.  Enhancing Text Retrieval Performance using Conceptual Ontological Graph , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[14]  Hassan J. Eghbali,et al.  K-S Test for Detecting Changes from Landsat Imagery Data , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Sanda M. Harabagiu,et al.  The Structure and Performance of an Open-Domain Question Answering System , 2000, ACL.

[16]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[17]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[18]  Xiaoli Li,et al.  A refinement approach to handling model misfit in text categorization , 2002, KDD.

[19]  Adwait Ratnaparkhi,et al.  IBM's Statistical Question Answering System , 2000, TREC.

[20]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[21]  Qingtian Zeng,et al.  Semantic Pattern for User-Interactive Question Answering , 2006, SKG.

[22]  Stan Matwin,et al.  Feature Engineering for Text Classification , 1999, ICML.

[23]  Qingtian Zeng,et al.  Semantic patterns for user‐interactive question answering , 2008, Concurr. Comput. Pract. Exp..

[24]  Anette Hulth,et al.  A Study on Automatically Extracted Keywords in Text Categorization , 2006, ACL.