Characterizing the Hypergraph-of-Entity Representation Model

The hypergraph-of-entity is a joint representation model for terms, entities and their relations, used as an indexing approach in entity-oriented search. In this work, we characterize the structure of the hypergraph, from a microscopic and macroscopic scale, as well as over time with an increasing number of documents. We use a random walk based approach to estimate shortest distances and node sampling to estimate clustering coefficients. We also propose the calculation of a general mixed hypergraph density based on the corresponding bipartite mixed graph. We analyze these statistics for the hypergraph-of-entity, finding that hyperedge-based node degrees are distributed as a power law, while node-based node degrees and hyperedge cardinalities are log-normally distributed. We also find that most statistics tend to converge after an initial period of accentuated growth in the number of documents.

[1]  Harry Halpin A Query-Driven Characterization of Linked Data , 2009, LDOW.

[2]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[3]  Yuzhong Qu,et al.  Object Link Structure in the Semantic Web , 2010, ESWC.

[4]  Sérgio Nunes,et al.  Hypergraph-of-entity , 2019, Open Comput. Sci..

[5]  Hannah Bast,et al.  An index for efficient semantic full-text search , 2013, CIKM.

[6]  Donald F. Towsley,et al.  Multiple random walks to uncover short paths in power law networks , 2012, 2012 Proceedings IEEE INFOCOM Workshops.

[7]  Marco Rosa,et al.  Four degrees of separation , 2011, WebSci '12.

[8]  David Oliveira Aparício,et al.  Graphlet-orbit Transitions (GoT): A fingerprint for temporal network comparison , 2018, PloS one.

[9]  Daniela Petrelli,et al.  Hybrid Search: Effectively Combining Keywords and Semantic Searches , 2008, ESWC.

[10]  Steffen Klamt,et al.  Hypergraphs and Cellular Networks , 2009, PLoS Comput. Biol..

[11]  Stéphane Marchand-Maillet,et al.  Adjacency and Tensor Representation in General Hypergraphs Part 1: e-adjacency Tensor Uniformisation Using Homogeneous Polynomials , 2017, ArXiv.

[12]  Yu Wei,et al.  Establishment and Analysis of the Supernetwork Model for Nanjing Metro Transportation System , 2018, Complexity.

[13]  Michael Himsolt,et al.  GML: A portable Graph File Format , 2010 .

[14]  Dan Li Shortest paths through a reinforced random walk , 2011 .

[15]  Ulrik Brandes,et al.  GraphML Progress Report , 2001, GD.

[16]  Björn Buchhold,et al.  Semantic Search on Text and Knowledge Bases , 2016, Found. Trends Inf. Retr..

[17]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[18]  Yi Zhao,et al.  Co-degree density of hypergraphs , 2007, J. Comb. Theory, Ser. A.

[19]  Piotr Zwierzykowski,et al.  Shortest Path Problem Solving Based on Ant Colony Optimization Metaheuristic , 2012 .

[20]  Ellen M. Voorhees,et al.  The efficiency of inverted index and cluster searches , 1986, SIGIR '86.

[21]  Debra Goldberg,et al.  Clustering Coefficients in Protein Interaction Hypernetworks , 2013, BCB.

[22]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.