Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.

[1]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[2]  Yiming Yang,et al.  Deep Learning for Extreme Multi-label Text Classification , 2017, SIGIR.

[3]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[4]  Yiming Yang,et al.  Multilabel classification with meta-level features in a learning-to-rank framework , 2011, Machine Learning.

[5]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[6]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[7]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[8]  John G. Breslin,et al.  A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis , 2016, EMNLP.

[9]  Lei Li,et al.  Reinforced Co-Training , 2018, NAACL.

[10]  Zhi-Hua Zhou,et al.  Multi-instance multi-label learning , 2008, Artif. Intell..

[11]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[12]  Heng Ji,et al.  AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding , 2016, EMNLP.

[13]  Eyke Hüllermeier,et al.  On label dependence in multilabel classification , 2010, ICML 2010.

[14]  Kentaro Inui,et al.  Neural Architectures for Fine-grained Entity Type Classification , 2016, EACL.

[15]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[16]  Charles Elkan,et al.  Optimal Thresholding of Classifiers to Maximize F1 Measure , 2014, ECML/PKDD.

[17]  Zhiyuan Liu,et al.  Improving Neural Fine-Grained Entity Typing With Knowledge Attention , 2018, AAAI.

[18]  Nevena Lazic,et al.  Context-Dependent Fine-Grained Entity Type Tagging , 2014, ArXiv.

[19]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[20]  Daniel S. Weld,et al.  Fine-Grained Entity Recognition , 2012, AAAI.

[21]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[22]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[23]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[24]  Ashish Anand,et al.  Fine-Grained Entity Type Classification by Jointly Learning Representations and Label Embeddings , 2017, EACL.

[25]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[26]  Magnus Thor Jonsson,et al.  Evolution and design of distributed learning rules , 2000, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No.00.

[27]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[28]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[29]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[30]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[31]  Johannes Fürnkranz,et al.  Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification , 2017, NIPS.

[32]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[33]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[34]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[35]  Sheng Chen,et al.  DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging , 2017, Rep4NLP@ACL.

[36]  Li Li,et al.  Multi-label Text Categorization with Joint Learning Predictions-as-Features Method , 2015, EMNLP.

[37]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[40]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[41]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.