论文信息 - Question Classification with Support Vector Machines and Error Correcting Codes

Question Classification with Support Vector Machines and Error Correcting Codes

In this paper we consider a machine learning technique for question classification. The goal is to replace our regular expression based classifier with a classifier that learns from a set of labeled questions. We have realized that an enourmous amount of time is required to create a rich collection of patterns and keywords for a good coverage of questions in an open-domain application. We decided to use support vector machines, since they have been successfully used for a number of benchmark problems. Although the support vector machines are inherently binary classifiers, it is possible to extend their use as multi-class classifiers using binary codes. We represent questions as frequency weighted vectors of salient terms. We compare our approcah to related work that uses relatively complex syntactic/semantic processing to create features and a sparse network of linear units to classify questions. We provide results to show performance of the method.

Wayne H. Ward | Kadri Hacioglu

[1] Thomas G. Dietterich,et al. Error-Correcting Output Codes: A General Method for Improving Multiclass Inductive Learning Programs , 1991, AAAI.

[2] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[3] Dan Roth,et al. Learning Question Classifiers , 2002, COLING.

[4] Adam L. Berger,et al. ERROR-CORRECTING OUTPUT CODING FOR TEXT CLASSIFICATION , 1999 .

[5] Richard M. Schwartz,et al. Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[6] Jason D. M. Rennie,et al. Improving Multiclass Text Classification with the Support Vector Machine , 2001 .

[7] S. H. Srinivasan. Features for Unsupervised Document Classification , 2002, CoNLL.

[8] Yoram Singer,et al. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..