HyperX: A Scalable Hypergraph Framework

Hypergraphs are generalizations of graphs where the (hyper)edges can connect any number of vertices. They are powerful tools for representing complex and non-pairwise relationships. However, existing graph computation frameworks cannot accommodate hypergraphs without converting them into graphs, because they do not offer APIs that support (hyper)edges directly. This graph conversion may create excessive replicas and result in very large graphs, causing difficulties in workload balancing. A few tools have been developed for hypergraph partitioning, but they are not general-purpose frameworks for hypergraph processing. In this paper, we propose HyperX, a general-purpose distributed hypergraph processing framework built on top of Spark. HyperX is based on the computation paradigm “Pregel”, which is user-friendly and has been widely adopted by popular graph computation frameworks. To help create balanced workloads for distributed hypergraph processing, we further investigate the hypergraph partitioning problem and propose a novel label propagation partitioning (LPP) algorithm. We conduct extensive experiments using both real and synthetic data. The result shows that HyperX achieves an order of magnitude improvement for running hypergraph learning algorithms compared with graph conversion based approaches in terms of running time, network communication costs, and memory consumption. For hypergraph partitioning, LPP outperforms the baseline algorithms significantly in these measures as well.

[1]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[2]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[3]  Ping Li,et al.  Hypergraph canonical correlation analysis for multi-label classification , 2014, Signal Process..

[4]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[5]  Cevdet Aykanat,et al.  Replicated partitioning for undirected hypergraphs , 2012, J. Parallel Distributed Comput..

[6]  Tamir Hazan,et al.  Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference , 2009, IEEE Transactions on Information Theory.

[7]  Jeffrey Xu Yu,et al.  Scalable Hypergraph Learning and Processing , 2015, 2015 IEEE International Conference on Data Mining.

[8]  Chun Chen,et al.  Mapping Users across Networks by Manifold Alignment on Hypergraph , 2014, AAAI.

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  TaeHyun Hwang,et al.  Learning on Weighted Hypergraphs to Integrate Protein Interactions and Gene Expressions for Cancer Outcome Prediction , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  Hung-Khoon Tan,et al.  Modeling video hyperlinks with hypergraph for web video reranking , 2008, ACM Multimedia.

[12]  Thomas Sauerwald,et al.  Balls-into-bins with nearly optimal load distribution , 2013, SPAA.

[13]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[14]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[15]  Qingshan Liu,et al.  Hypergraph with sampling for image retrieval , 2011, Pattern Recognit..

[16]  Mehmet Deveci,et al.  Hypergraph partitioning for multiple communication cost metrics: Model and methods , 2015, J. Parallel Distributed Comput..

[17]  Junhu Wang,et al.  Dominating sets in directed graphs , 2010, Inf. Sci..

[18]  Chun Chen,et al.  Using rich social media information for music recommendation via hypergraph model , 2011, TOMCCAP.

[19]  Changsheng Xu,et al.  Topic-Sensitive Influencer Mining in Interest-Based Social Media Networks via Hypergraph Learning , 2014, IEEE Transactions on Multimedia.

[20]  Hui Xiong,et al.  Hypergraph partitioning for document clustering: a unified clique perspective , 2008, SIGIR '08.

[21]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[22]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[23]  Li Ma,et al.  A Distributed Algorithm for Balanced Hypergraph Partitioning , 2016, APSCC.

[24]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[25]  Prasad Raghavendra,et al.  Optimal algorithms and inapproximability results for every CSP? , 2008, STOC.

[26]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[27]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[28]  Robert Krauthgamer,et al.  Partitioning graphs into balanced components , 2009, SODA.

[29]  Binyu Zang,et al.  Bipartite-oriented distributed graph partitioning for big learning , 2014, APSys.

[30]  Lars Backstrom,et al.  Balanced label propagation for partitioning massive graphs , 2013, WSDM.

[31]  G. Karypis,et al.  Multilevel k-way hypergraph partitioning , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[32]  William J. Knottenbelt,et al.  Par kway 2.0: A Parallel Multilevel Hypergraph Partitioning Tool , 2004, ISCIS.

[33]  Marc Lelarge,et al.  Balanced graph edge partition , 2014, KDD.

[34]  Christos Faloutsos,et al.  Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation , 2011, PAKDD.

[35]  Rob H. Bisseling,et al.  Parallel hypergraph partitioning for scientific computing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[36]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[37]  George Karypis,et al.  Multiobjective hypergraph-partitioning algorithms for cut and maximum subdomain-degree minimization , 2006, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[38]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[39]  William J. Knottenbelt,et al.  Parallel multilevel algorithms for hypergraph partitioning , 2008, J. Parallel Distributed Comput..

[40]  Hongzhi Wang,et al.  Efficient Subgraph Matching Using GPUs , 2014, ADC.

[41]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.