Breaking Cycles In Noisy Hierarchies

Taxonomy graphs that capture hyponymy or meronymy relationships through directed edges are expected to be acyclic. However, in practice, they may have thousands of cycles, as they are often created in a crowd-sourced way. Since these cycles represent logical fallacies, they need to be removed for many web applications. In this paper, we address the problem of breaking cycles while preserving the logical structure (hierarchy) of a directed graph as much as possible. Existing approaches for this problem either need manual intervention or use heuristics that can critically alter the taxonomy structure. In contrast, our approach infers graph hierarchy using a range of features, including a Bayesian skill rating system and a social agony metric. We also devise several strategies to leverage the inferred hierarchy for removing a small subset of edges to make the graph acyclic. Extensive experiments demonstrate the effectiveness of our approach.

[1]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[2]  Iryna Gurevych,et al.  Analysis of the Wikipedia Category Graph for NLP Applications , 2007 .

[3]  Shuvra S. Bhattacharyya,et al.  Cycle-Breaking Techniques for Scheduling Synchronous Dataflow Graphs , 2007 .

[4]  Tobias Friedrich,et al.  Average-case analysis of incremental topological ordering , 2010, Discret. Appl. Math..

[5]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[6]  Nikolaj Tatti,et al.  Faster Way to Agony - Discovering Hierarchies in Directed Graphs , 2014, ECML/PKDD.

[7]  Byron J. Gao,et al.  Adapting vector space model to ranking-based collaborative filtering , 2012, CIKM.

[8]  Hermann Schichl,et al.  An Exact Method for the Minimum Feedback Arc Set Problem , 2021, ACM J. Exp. Algorithmics.

[9]  Thomas Hofmann,et al.  TrueSkill™: A Bayesian Skill Rating System , 2007 .

[10]  James J. Cimino,et al.  Research Paper: Auditing the Unified Medical Language System with Semantic Methods , 1998, J. Am. Medical Informatics Assoc..

[11]  Eljas Soisalon-Soininen,et al.  On Finding the Strongly Connected Components in a Directed Graph , 1994, Inf. Process. Lett..

[12]  Jing Liu,et al.  Question Difficulty Estimation in Community Question Answering Services , 2013, EMNLP.

[13]  Young-In Song,et al.  Competition-based user expertise score estimation , 2011, SIGIR.

[14]  Javier Nogueras-Iso,et al.  Access to terminological ontologies , 2010 .

[15]  Nikolaj Tatti,et al.  Hierarchies in Directed Networks , 2015, 2015 IEEE International Conference on Data Mining.

[16]  Liviu Iftode,et al.  Finding hierarchy in directed online social networks , 2011, WWW.

[17]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[18]  Osma Suominen,et al.  Assessing and Improving the Quality of SKOS Vocabularies , 2014, Journal on Data Semantics.

[19]  Eero Hyvönen,et al.  Improving the Quality of SKOS Vocabularies with Skosify , 2012, EKAW.

[20]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[21]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[22]  Dale F. Rudd,et al.  On the ordering of recycle calculations , 1966 .

[23]  Irith Pomeranz,et al.  An optimal algorithm for cycle breaking in directed graphs , 1995, J. Electron. Test..

[24]  Aldo Gangemi,et al.  An ontological analysis of the UMLS Metathesaurus , 1998, AMIA.

[25]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[26]  Donald B. Johnson,et al.  Finding All the Elementary Circuits of a Directed Graph , 1975, SIAM J. Comput..

[27]  Xuemin Lin,et al.  A heuristic for the feedback arc set problem , 1995, Australas. J Comb..

[28]  Roberto Tamassia,et al.  Handbook of Graph Drawing and Visualization (Discrete Mathematics and Its Applications) , 2007 .

[29]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[30]  Joseph Naor,et al.  Approximating Minimum Feedback Sets and Multicuts in Directed Graphs , 1998, Algorithmica.

[31]  Xuemin Lin,et al.  A Fast and Effective Heuristic for the Feedback Arc Set Problem , 1993, Inf. Process. Lett..

[32]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[33]  Olivier Bodenreider,et al.  Circular hierarchical relationships in the UMLS: etiology, diagnosis, treatment, complications and prevention , 2001, AMIA.

[34]  Hui Xiong,et al.  Learning to Recommend Accurate and Diverse Items , 2017, WWW.

[35]  Byron J. Gao,et al.  Learning to rank for hybrid recommendation , 2012, CIKM.

[36]  Jens Lehmann,et al.  Unsupervised learning of an extensive and usable taxonomy for DBpedia , 2015, SEMANTiCS.

[37]  Javier Nogueras-Iso,et al.  Terminological Ontologies - Design, Management and Practical Applications , 2010, Semantic Web and Beyond: Computing for Human Experience.

[38]  Shuvra S. Bhattacharyya,et al.  Efficient simulation of critical synchronous dataflow graphs , 2007, ACM Trans. Design Autom. Electr. Syst..

[39]  Edward A. Lee,et al.  Software Synthesis from Dataflow Graphs , 1996 .

[40]  Byron J. Gao,et al.  VSRank: A Novel Framework for Ranking-Based Collaborative Filtering , 2014, TIST.

[41]  Youssef Saab,et al.  A Fast and Effective Algorithm for the Feedback Arc Set Problem , 2001, J. Heuristics.

[42]  Olivier Bodenreider,et al.  Approaches to Eliminating Cycles in the UMLS Metathesaurus: Naïve vs. Formal , 2005, AMIA.