Community question topic categorization via hierarchical kernelized classification

We present a hierarchical kernelized classification model for the automatic classification of general questions into their corresponding topic categories in community Question Answering service (cQAs). This could save many efforts of manual classification and facilitate browsing as well as better retrieving of questions from the cQA archives. To deal with the challenge of short text message of questions, we explore and optimally combine various cQA features by introducing multiple kernel learning strategy into the hierarchical classification framework. We propose a hybrid regularization approach of combining orthogonal constraint and L1 sparseness in our framework to promote the discriminative power on similar topics as well as sparsing the model parameters. The experimental results on a real world dataset from Yahoo! Answers demonstrate the effectiveness of our proposed model as compared to the state-of-the-art methods and strong baselines.

[1]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[2]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[3]  Bo Qu,et al.  An evaluation of classification models for question topic categorization , 2012, J. Assoc. Inf. Sci. Technol..

[4]  Christian S. Jensen,et al.  The use of categorization information in language models for question retrieval , 2009, CIKM.

[5]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[6]  Eugene Agichtein,et al.  When web search fails, searchers become askers: understanding the transition , 2012, SIGIR '12.

[7]  Jeffrey Pomerantz,et al.  Evaluating and predicting answer quality in community QA , 2010, SIGIR.

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Christian S. Jensen,et al.  A generalized framework of exploring category information for question retrieval in community question answer archives , 2010, WWW '10.

[10]  Alton Y. K. Chua,et al.  Question classification in social media , 2009 .

[11]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[12]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[13]  Alessandro Moschitti,et al.  Syntactic and Semantic Kernels for Short Text Pair Categorization , 2009, EACL.

[14]  Roberto Basili,et al.  Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification , 2007, ACL.

[15]  Tat-Seng Chua,et al.  Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization , 2012, ACL.

[16]  F. Maxwell Harper,et al.  Question types in social Q&A sites , 2010, First Monday.

[17]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[18]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[19]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[20]  Li Cai,et al.  Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge , 2011, CIKM '11.

[21]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[22]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[23]  Yu Hao,et al.  Function-Based Question Classification for General QA , 2010, EMNLP.

[24]  Lin Xiao,et al.  Hierarchical Classification via Orthogonal Transfer , 2011, ICML.

[25]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[26]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[27]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[28]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[29]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.