Few-Shot Learning for Chinese Legal Controversial Issues Classification

Chinese courts organize debates surrounding controversial issues along with the gradual formation of the new procedural system. With the progress of China’s judicial reform, more than 80 million judgement documents have been made public online. Similar controversial issues identified in and among the massive public judgment documents are of significant value for judges in their trial work. Hence, homogeneous controversial issues classification becomes the basis for similar cases retrieval. However, controversial issues follow the power-law distribution, not all of them are within the labels provided by manual annotation and their categories cannot be exhausted. In order to generalize those unfamiliar categories without necessitating extensive retraining, we propose a controversial issues classification algorithm based on few-shot learning. Two few-shot learning algorithms are proposed for our controversial issues problem, Relation Network and Induction Network, respectively. With only a handful of given instances, both of them have shown excellent results on the two datasets, which proves their effectiveness in adapting to accommodating new categories not seen in training. The proposed method provides trial assistance for judges, promotes the dissemination of experience and improves fairness of adjudication.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[3]  Benjamin King Step-Wise Clustering Procedures , 1967 .

[4]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[5]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[6]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[7]  Chen Gui-ming Issues Concerning Several Relations in the Design of Pretrial Preliminary Procedure , 2004 .

[8]  Jian Sun,et al.  Induction Networks for Few-Shot Text Classification , 2019, EMNLP/IJCNLP.

[9]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[10]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[11]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[12]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[13]  Peter Willett,et al.  Hierarchic document classification using Ward's clustering method , 1986, SIGIR '86.

[14]  Jin Wang,et al.  The Framework of Network Public Opinion Monitoring and Analyzing System Based on Semantic Content Identification , 2010, J. Convergence Inf. Technol..

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[17]  Yang Weng,et al.  K-Means Clustering for Controversial Issues Merging in Chinese Legal Texts , 2018, JURIX.

[18]  Chiara Francalanci,et al.  Representing Social Influencers and Influence using Power-Law Graphs , 2015 .

[19]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[20]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[21]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  S. M. Elseuofi,et al.  MACHINE LEARNING METHODS FOR SPAM E-MAIL CLASSIFICATION , 2011 .

[23]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[24]  Roberto Basili,et al.  Semantic Compositionality in Tree Kernels , 2014, CIKM.

[25]  Weiyao Lin,et al.  Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion , 2018, AAAI.

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[28]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[29]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[31]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[32]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[33]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[34]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[35]  David F. Gleich,et al.  Revisiting Power-law Distributions in Spectra of Real World Networks , 2017, KDD.

[36]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[37]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[38]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[39]  Haim Levkowitz,et al.  Introduction to information retrieval (IR) , 2008 .