A Study on the Autoregressive and non-Autoregressive Multi-label Learning

Extreme classification tasks are multi-label tasks with an extremely large number of labels (tags). These tasks are hard because the label space is usually (i) very large, e.g. thousands or millions of labels, (ii) very sparse, i.e. very few labels apply to each input document, and (iii) highly correlated, meaning that the existence of one label changes the likelihood of predicting all other labels. In this work, we propose a self-attention based variational encoder-model to extract the label-label and label-feature dependencies jointly and to predict labels for a given input. In more detail, we propose a non-autoregressive latent variable model and compare it to a strong autoregressive baseline that predicts a label based on all previously generated labels. Our model can therefore be used to predict all labels in parallel while still including both label-label and label-feature dependencies through latent variables, and compares favourably to the autoregressive baseline. We apply our models to four standard extreme classification natural language data sets, and one news videos dataset for automated label detection from a lexicon of semantic concepts. Experimental results show that although the autoregressive models, where use a given order of the labels for chain-order label prediction, work great for the small scale labels or the prediction of the highly ranked label, but our non-autoregressive model surpasses them by around 2% to 6% when we need to predict more labels, or the dataset has a larger number of the labels.

[1]  Pascale Fung,et al.  A Submodular Feature-Aware Framework for Label Subset Selection in Extreme Classification Problems , 2019, NAACL-HLT.

[2]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[3]  Hsuan-Tien Lin,et al.  Feature-aware Label Space Dimension Reduction for Multi-label Classification , 2012, NIPS.

[4]  Concha Bielza,et al.  Multi-label classification with Bayesian network-based chain classifiers , 2014, Pattern Recognit. Lett..

[5]  Pradeep Ravikumar,et al.  PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification , 2017, KDD.

[6]  Eyke Hüllermeier,et al.  On the Problem of Error Propagation in Classifier Chains for Multi-label Classification , 2012, GfKl.

[7]  Jason Lee,et al.  Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[8]  A. Zubiaga Enhancing Navigation on Wikipedia with Social Tags , 2012, ArXiv.

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[11]  Kyunghyun Cho,et al.  Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior , 2019, AAAI.

[12]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[15]  Bernhard Schölkopf,et al.  Data scarcity, robustness and extreme multi-label classification , 2019, Machine Learning.

[16]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[17]  Jianmin Wang,et al.  Multi-label Classification via Feature-aware Implicit Label Space Encoding , 2014, ICML.

[18]  Johannes Fürnkranz,et al.  Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification , 2017, NIPS.

[19]  Pradeep Ravikumar,et al.  PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification , 2016, ICML.

[20]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[21]  Weiwei Liu,et al.  On the Optimality of Classifier Chain for Multi-label Classification , 2015, NIPS.

[22]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[23]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[24]  Tat-Seng Chua,et al.  Automatic image annotation via local multi-label classification , 2008, CIVR '08.

[25]  Lawrence Carin,et al.  Large-Scale Bayesian Multi-Label Learning via Topic-Based Label Embeddings , 2015, NIPS.

[26]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[27]  Jeff G. Schneider,et al.  A Composite Likelihood View for Multi-Label Classification , 2012, AISTATS.

[28]  Johannes Fürnkranz,et al.  Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain , 2008, ECML/PKDD.

[29]  Krishnakumar Balasubramanian,et al.  The Landmark Selection Method for Multiple Output Prediction , 2012, ICML.

[30]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[31]  Hsuan-Tien Lin,et al.  Multilabel Classification with Principal Label Space Transformation , 2012, Neural Computation.

[32]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[33]  Prateek Jain,et al.  Locally Non-linear Embeddings for Extreme Multi-label Learning , 2015, ArXiv.

[34]  Ivor W. Tsang,et al.  Survey on Multi-Output Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[35]  John Z. Zhang,et al.  Enhancing multi-label music genre classification through ensemble techniques , 2011, SIGIR.

[36]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[37]  Charles Elkan,et al.  Beam search algorithms for multilabel learning , 2013, Machine Learning.

[38]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[39]  Elena Montañés,et al.  Using A* for Inference in Probabilistic Classifier Chains , 2015, IJCAI.

[40]  James T. Kwok,et al.  Efficient Multi-label Classification with Many Labels , 2013, ICML.

[41]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[42]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  James T. Kwok,et al.  Multi-Label learning in the independent label sub-spaces , 2017, Pattern Recognit. Lett..