Modeling Dynamic Pairwise Attention for Crime Classification over Legal Articles

In juridical field, judges usually need to consult several relevant cases to determine the specific articles that the evidence violated, which is a task that is time consuming and needs extensive professional knowledge. In this paper, we focus on how to save the manual efforts and make the conviction process more efficient. Specifically, we treat the evidences as documents, and articles as labels, thus the conviction process can be cast as a multi-label classification problem. However, the challenge in this specific scenario lies in two aspects. One is that the number of articles that evidences violated is dynamic, which we denote as the label dynamic problem. The other is that most articles are violated by only a few of the evidences, which we denote as the label imbalance problem. Previous methods usually learn the multi-label classification model and the label thresholds independently, and may ignore the label imbalance problem. To tackle with both challenges, we propose a unified D ynamic P airwise A ttention M odel (DPAM for short) in this paper. Specifically, DPAM adopts the multi-task learning paradigm to learn the multi-label classifier and the threshold predictor jointly, and thus DPAM can improve the generalization performance by leveraging the information learned in both of the two tasks. In addition, a pairwise attention model based on article definitions is incorporated into the classification model to help alleviate the label imbalance problem. Experimental results on two real-world datasets show that our proposed approach significantly outperforms state-of-the-art multi-label classification methods.

[1]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[2]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[3]  Changsheng Li,et al.  Self-Paced Multi-Task Learning , 2016, AAAI.

[4]  Carla E. Brodley,et al.  Class Imbalance, Redux , 2011, 2011 IEEE 11th International Conference on Data Mining.

[5]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[6]  Xin Li,et al.  Multi-Label Classification with Feature-Aware Non-Linear Label Space Transformation , 2015, IJCAI.

[7]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[8]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[9]  Narendra Ahuja,et al.  Robust Visual Tracking via Structured Multi-Task Sparse Learning , 2012, International Journal of Computer Vision.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[12]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[13]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[14]  Christoph H. Lampert,et al.  Multi-task Learning with Labeled and Unlabeled Tasks , 2016, ICML.

[15]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[16]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[17]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Du-Sik Park,et al.  Rotating your face using multi-task deep neural network , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yaohui Jin,et al.  A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task Learning , 2017, IJCAI.

[20]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[21]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[22]  Nitesh V. Chawla,et al.  Inferring user demographics and social strategies in mobile social networks , 2014, KDD.

[23]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[24]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[25]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[26]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[27]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Oluwasanmi Koyejo,et al.  Consistent Multilabel Classification , 2015, NIPS.

[29]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[30]  Miroslav Kubat Induction in Multi-Label Domains , 2017 .

[31]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[32]  Ichiro Sakata,et al.  Extractive Summarization Using Multi-Task Learning with Document Classification , 2017, EMNLP.

[33]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[34]  Qiang Yang,et al.  User demographics prediction based on mobile data , 2013, Pervasive Mob. Comput..

[35]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[36]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[37]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[38]  Tian Xia,et al.  A Multi-label Ensemble Method Based on Minimum Ranking Margin Maximization , 2015, 2015 IEEE International Conference on Data Mining.

[39]  Qing Ling,et al.  Multi-Task Learning for Subspace Segmentation , 2015, ICML.

[40]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[41]  Luís Torgo,et al.  A Survey of Predictive Modelling under Imbalanced Distributions , 2015, ArXiv.

[42]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[43]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[44]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[45]  Qian Xu,et al.  Probabilistic Multi-Task Feature Selection , 2010, NIPS.