M ODELING L ABEL S PACE I NTERACTIONS IN M ULTI LABEL C LASSIFICATION USING B OX E MBEDDINGS

Multi-label classification is a challenging structured prediction task in which a set of output class labels are predicted for each input. Real-world datasets often have taxonomic relationships between labels which can be explicit, implicit, or partially observed. Most existing multi-label classification methods either ignore the label taxonomy or require the complete specification of the taxonomy at training and inference time to enforce coherence in their predictions. In this work we intro-duce the multi-label box model (MBM), a multi-label classification method that combines the encoding power of neural networks with the inductive bias of probabilistic box embeddings (Vilnis et al., 2018), which can be understood as trainable Venn-diagrams based on hyper-rectangles. By representing labels as boxes, MBM is able to capture taxonomic relations among labels without them being provided explicitly. Furthermore, since MBM learns the label-label relationships from data and represents them as calibrated conditional probabilities, it provides a high degree of interpretability. This interpretability also facilitates the injection of partial information about label-label relationships into model training, to further improve its consistency. We provide theoretical grounding for our method and show ex-perimentally the model’s ability to learn the true latent taxonomic structure from data. Through extensive empirical evaluations on twelve multi-label classification datasets, we show that MBM can significantly improve taxonomic consistency while maintaining the state-of-the-art predictive performance. 1

[1]  Tejas Chheda,et al.  Box Embeddings: An open-source library for representation learning using geometric structures , 2021, EMNLP.

[2]  Soumya Chatterjee,et al.  Joint Learning of Hyperbolic Label Embeddings for Hierarchical Multi-label Classification , 2021, EACL.

[3]  Michael Boratko,et al.  Modeling Fine-Grained Entity Types with Box Embeddings , 2021, ACL.

[4]  Thomas Lukasiewicz,et al.  Coherent Hierarchical Multi-Label Classification Networks , 2020, NeurIPS.

[5]  L. Vilnis,et al.  Improving Local Identifiability in Probabilistic Box Embeddings , 2020, NeurIPS.

[6]  Thomas Lukasiewicz,et al.  BoxE: A Box Embedding Model for Knowledge Base Completion , 2020, NeurIPS.

[7]  Andrew McCallum,et al.  Representing Joint Hierarchies with Box Embeddings , 2020, AKBC.

[8]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[9]  Enhong Chen,et al.  Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach , 2019, CIKM.

[10]  Rodrigo C. Barros,et al.  Hierarchical Multi-Label Classification Networks , 2018, ICML.

[11]  Andrew McCallum,et al.  Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking , 2018, ACL.

[12]  Thomas Hofmann,et al.  Hyperbolic Neural Networks , 2018, NeurIPS.

[13]  Xiang Li,et al.  Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures , 2018, ACL.

[14]  Thomas Hofmann,et al.  Hyperbolic Entailment Cones for Learning Hierarchical Embeddings , 2018, ICML.

[15]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[16]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[17]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[18]  Andrew McCallum,et al.  Structured Prediction Energy Networks , 2015, ICML.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical multi-label classification using local neural networks , 2014, J. Comput. Syst. Sci..

[21]  Saso Dzeroski,et al.  Hierarchical annotation of medical images , 2011, Pattern Recognit..

[22]  J. Demšar Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[23]  R. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006 .

[24]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[25]  Shib Sankar Dasgupta,et al.  Box-To-Box Transformations for Modeling Joint Hierarchies , 2021, REPL4NLP.

[26]  K. Clarkson,et al.  Capacity and Bias of Learned Geometric Embeddings for Directed Graphs , 2021, NeurIPS.

[27]  Andrew McCallum,et al.  Word2Box: Learning Word Representation Using Box Embeddings , 2021, ArXiv.

[28]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.