Meta-LMTC: Meta-Learning for Large-Scale Multi-Label Text Classification

Large-scale multi-label text classification (LMTC) tasks often face long-tailed label distributions, where many labels have few or even no training instances. Although current methods can exploit prior knowledge to handle these few/zero-shot labels, they neglect the meta-knowledge contained in the dataset that can guide models to learn with few samples. In this paper, for the first time, this problem is addressed from a meta-learning perspective. However, the simple extension of meta-learning approaches to multi-label classification is sub-optimal for LMTC tasks due to long-tailed label distribution and coexisting of few- and zero-shot scenarios. We propose a meta-learning approach named META-LMTC. Specifically, it constructs more faithful and more diverse tasks according to well-designed sampling strategies and directly incorporates the objective of adapting to new low-resource tasks into the meta-learning phase. Extensive experiments show that META-LMTC achieves state-of-the-art performance against strong baselines and can still enhance powerful BERTlike models.

[1]  Ming Liu,et al.  Multi-label Few/Zero-shot Learning with Knowledge Aggregated from Multiple Label Graphs , 2020, EMNLP.

[2]  Ion Androutsopoulos,et al.  An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels , 2020, EMNLP.

[3]  Trapit Bansal,et al.  Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks , 2020, EMNLP.

[4]  Fei Li,et al.  ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network , 2019, AAAI.

[5]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[6]  Artem Molchanov,et al.  Generalized Inner Loop Meta-Learning , 2019, ArXiv.

[7]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[8]  Jing Tang,et al.  NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit , 2019, ACL.

[9]  Zhou Yu,et al.  Domain Adaptive Dialog Generation via Meta Learning , 2019, ACL.

[10]  Ion Androutsopoulos,et al.  Large-Scale Multi-Label Text Classification on EU Legislation , 2019, ACL.

[11]  Boi Faltings,et al.  Meta-Learning for Low-resource Natural Language Generation in Task-oriented Dialogue Systems , 2019, IJCAI.

[12]  Qingyu Chen,et al.  BioWordVec, improving biomedical word embeddings with subword information and MeSH , 2019, Scientific Data.

[13]  Jian Sun,et al.  Induction Networks for Few-Shot Text Classification , 2019, EMNLP/IJCNLP.

[14]  Sergey Levine,et al.  Online Meta-Learning , 2019, ICML.

[15]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Ramakanth Kavuluru,et al.  Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces , 2018, EMNLP.

[18]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[19]  Jimeng Sun,et al.  Explainable Prediction of Medical Codes from Clinical Text , 2018, NAACL.

[20]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[21]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[22]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[23]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[24]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[25]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[26]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[27]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[28]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[29]  Jennifer G. Dy,et al.  Medical coding classification by leveraging inter-code relationships , 2010, KDD.

[30]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.