Exploiting syntactic and semantic information in coarse chinese question classification

Recent years have seen great process in studying English question classification. In our research, we learn Chinese question classification by exploiting the result of lexical, syntactic and semantic parsing on question sentences. Support vector machines are adopted to train a classifier on 6 coarse categories using single and combination of different parsing results as features. We find that even the surface information such as words and parts of speech could lead to a satisfying result, while augmenting the classifier with syntactic and semantic features could give even higher precision. However, the lack of words and incomplete syntactic structures among most questions cause combination of features even sparser than single features in the feature space, with much side effect brought to the performance of Chinese question classification.

[1]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[2]  Ming Zhou,et al.  Reranking Answers for Definitional QA Using Language Modeling , 2006, ACL.

[3]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[4]  Noriko Tomuro,et al.  Question Terminology and Representation for Question Type Classification , 2002, COLING 2002.

[5]  Zhong Yi-xin Cascade Identification of Chinese Chunks , 2008 .

[6]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[7]  Eduard H. Hovy,et al.  Toward Semantics-Based Answer Pinpointing , 2001, HLT.

[8]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[9]  Dan Roth,et al.  Learning question classifiers: the role of semantic information , 2005, Natural Language Engineering.

[10]  Roberto Basili,et al.  Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification , 2007, ACL.

[11]  Djoerd Hiemstra,et al.  Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002 , 2003, SIGF.

[12]  Yixin Zhong,et al.  Semantic Role Labeling for multi-VP clauses in Chinese , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[13]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[14]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[15]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .