Chinese Questions Classification in the Law Domain

Question classification is an essential part of Question Answering system(QA). This paper introduces our research work on automatic question classification that depends on the sample set including questions from legal forum. We propose a taxonomy for law question, and divide questions into three main parts: civil, criminal and administrative according to Chinese legal system. We have experimented with four machine learning algorithms: Nearest Neighbors (NN), Naïve Bayes (NB), Logistic Regression (LR) and Support Vector Machines (SVM) using two kinds of features: TF-IDF and word2vec embeddings. Further, we used fastText and adjusted the parameters to get the better results. The research shows high accuracy in Chinese question classification in law domain. Moreover, to the best of our knowledge, our work is the first attempt in this promising domain.

[1]  Ming Zhou,et al.  Question Answering over Freebase with Multi-Column Convolutional Neural Networks , 2015, ACL.

[2]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[3]  Roberto Alejo,et al.  An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem , 2014, Neural Processing Letters.

[4]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[5]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[6]  Wen Xu Modified Bayesian Model Based Question Classification , 2005 .

[7]  Dong Yan-ju HowNet Based Chinese Question Automatic Classification , 2007 .

[8]  Jianyi Guo,et al.  Question classification based on co-training style semi-supervised learning , 2010, Pattern Recognit. Lett..

[9]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[10]  Yun Zhu,et al.  Support vector machines and Word2vec for text classification with semantic features , 2015, 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).

[11]  Bo Liu,et al.  Support Vector Machines for Text Categorization in Chinese Question Classification , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[12]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[13]  Zhalaing Cheung,et al.  Feature Extraction for Learning to Classify Questions , 2004, Australian Conference on Artificial Intelligence.

[14]  A Classification of Chinese Questions in the Domain of Stocks , 2011 .

[15]  Rakesh Kumar,et al.  Question Classification using syntactic and rule based approach , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.