Multi-Task CNN for Classification of Chinese Legal Questions

This paper proposes a multi-task learning algorithm to classify the Chinese legal questions using deep convolutional neural networks (CNN). First, we propose a multi-task Convolutional Neural Network (CNN) for classification of Chinese legal questions with trainable word embedding where coarse grained classification is the main task and fine grained classification is the side task. Second, we develop a hierarchical classification model which takes the output of coarse classification as one part of the input for fine grained classification. We find that the side task can improve the accuracy and efficiency of the classification in a certain extent. Our experiments on the entire Chinese Legal Questions Dataset (LQDS) demonstrate the effectiveness of the proposed approach. To the best of our knowledge, this is the first work using almost all data in LQDS for classification and we achieve the state of the art performance.

[1]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[2]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[3]  Matt Post,et al.  Explicit and Implicit Syntactic Features for Text Classification , 2013, ACL.

[4]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[5]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[7]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[8]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[9]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[10]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[11]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[12]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[13]  Gang Wang,et al.  Multi-Task CNN Model for Attribute Prediction , 2015, IEEE Transactions on Multimedia.

[14]  Luísa Coheur,et al.  From symbolic to sub-symbolic information in question classification , 2011, Artificial Intelligence Review.

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[17]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[18]  Qiang Zhou,et al.  Learning to Share Latent Tasks for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[20]  Philip Resnik,et al.  Political Ideology Detection Using Recursive Neural Networks , 2014, ACL.

[21]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.