Joint Learning of Hyperbolic Label Embeddings for Hierarchical Multi-label Classification

We consider the problem of multi-label classification, where the labels lie in a hierarchy. However, unlike most existing works in hierarchical multi-label classification, we do not assume that the label-hierarchy is known. Encouraged by the recent success of hyperbolic embeddings in capturing hierarchical relations, we propose to jointly learn the classifier parameters as well as the label embeddings. Such a joint learning is expected to provide a twofold advantage: i) the classifier generalises better as it leverages the prior knowledge of existence of a hierarchy over the labels, and ii) in addition to the label co-occurrence information, the label-embedding may benefit from the manifold structure of the input datapoints, leading to embeddings that are more faithful to the label hierarchy. We propose a novel formulation for the joint learning and empirically evaluate its efficacy. The results show that the joint learning improves over the baseline that employs label co-occurrence based pre-trained hyperbolic embeddings. Moreover, the proposed classifiers achieve state-of-the-art generalization on standard benchmarks. We also present evaluation of the hyperbolic embeddings obtained by joint learning and show that they represent the hierarchy more accurately than the other alternatives.1

[1]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[2]  Celine Vens,et al.  Labelling strategies for hierarchical multi-label classification techniques , 2016, Pattern Recognit..

[3]  Jianxin Li,et al.  Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN , 2018, WWW.

[4]  Amin Vahdat,et al.  Hyperbolic Geometry of Complex Networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Ke Wang,et al.  Hierarchical Classification of Real Life Documents , 2001, SDM.

[6]  Xiang Li,et al.  Smoothing the Geometry of Probabilistic Box Embeddings , 2018, ICLR.

[7]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[8]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[9]  Ganesh Ramakrishnan,et al.  Beyond Clustering: Sub-DAG Discovery for Categorising Documents , 2016, CIKM.

[10]  Jiawei Han,et al.  Hierarchical Text Classification with Reinforced Label Assignment , 2019, EMNLP.

[11]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[12]  Claudio Gentile,et al.  Hierarchical classification: combining Bayes with SVM , 2006, ICML.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Ganesh Ramakrishnan,et al.  An Interactive Multi-Label Consensus Labeling Model for Multiple Labeler Judgments , 2018, AAAI.

[15]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[16]  Alex Alves Freitas,et al.  A Global-Model Naive Bayes Approach to the Hierarchical Prediction of Protein Functions , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[17]  Zheng Chen,et al.  Effective multi-label active learning for text classification , 2009, KDD.

[18]  Douwe Kiela,et al.  Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry , 2018, ICML.

[19]  Yiming Yang,et al.  Deep Learning for Extreme Multi-label Text Classification , 2017, SIGIR.

[20]  Lin Xiao,et al.  Hyperbolic Interaction Model For Hierarchical Multi-Label Classification , 2019, AAAI.

[21]  Rishabh K. Iyer,et al.  Summarization of Multi-Document Topic Hierarchies using Submodular Mixtures , 2015, ACL.

[22]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[23]  A. O. Houcine On hyperbolic groups , 2006 .

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  J. H. Zar,et al.  Spearman Rank Correlation , 2005 .

[26]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[27]  ChengXiang Zhai,et al.  DeepMeSH: deep semantic representation for improving large-scale MeSH indexing , 2016, Bioinform..

[28]  Rodrigo C. Barros,et al.  Hierarchical Multi-Label Classification Networks , 2018, ICML.

[29]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[30]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[31]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical multi-label classification for protein function prediction: A local approach based on neural networks , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[32]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[33]  Yiming Yang,et al.  Recursive regularization for large-scale classification with hierarchical and graphical dependencies , 2013, KDD.