Advantages to modeling relational data using hypergraphs versus graphs

Driven by the importance of relational aspects of data to decision-making, graph algorithms have been developed, based on simplified pairwise relationships, to solve a variety of problems. However, evidence has shown that hypergraphs-generalizations of graphs with (hyper)edges that connect any number of vertices-can better model complex, non-pairwise relationships in data and lead to better informed decisions. In this work, we compare graph and hypergraph models in the context of spectral clustering. For these problems, we demonstrate that hypergraphs are computationally more efficient and can better model complex, non-pairwise relationships for many datasets.

[1]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[2]  Andy B. Yoo,et al.  A scalable eigensolver for large scale-free graphs using 2D graph partitioning , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[3]  Richard B. Lehoucq,et al.  Anasazi software for the numerical solution of large-scale eigenvalue problems , 2009, TOMS.

[4]  S. Fienberg,et al.  Categorical Data Analysis of Single Sociometric Relations , 1981 .

[5]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[6]  Sivasankaran Rajamanickam,et al.  Scalable matrix computations on large scale-free graphs using 2D graph partitioning , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[7]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[8]  F. Chung The Laplacian of a Hypergraph. , 1992 .

[9]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[10]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[11]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[12]  Steffen Klamt,et al.  Hypergraphs and Cellular Networks , 2009, PLoS Comput. Biol..

[13]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[14]  Michael Johnston,et al.  Hyper-edges and multidimensional centrality , 2004, Soc. Networks.

[15]  Adrian Corduneanu,et al.  Distributed Information Regularization on Graphs , 2004, NIPS.

[16]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[17]  Benjamin A. Miller,et al.  Sparse matrix partitioning for parallel eigenanalysis of large static and dynamic graphs , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[18]  Jeremy Kepner,et al.  D4M 2.0 schema: A general purpose high performance schema for the Accumulo database , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[19]  Koji Tsuda,et al.  Propagating distributions on a hypergraph by dual information regularization , 2005, ICML.

[20]  Claude Berge,et al.  Hypergraphs - combinatorics of finite sets , 1989, North-Holland mathematical library.

[21]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[22]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.