Temporal Workload-Aware Replicated Partitioning for Social Networks

Most frequent and expensive queries in social networks involve multi-user operations such as requesting the latest tweets or news-feeds of friends. The performance of such queries are heavily dependent on the data partitioning and replication methodologies adopted by the underlying systems. Existing solutions for data distribution in these systems involve hashor graph-based approaches that ignore the multi-way relations among data. In this work, we propose a novel data partitioning and selective replication method that utilizes the temporal information in prior workloads to predict future query patterns. Our method utilizes the social network structure and the temporality of the interactions among its users to construct a hypergraph that correctly models multi-user operations. It then performs simultaneous partitioning and replication of this hypergraph to reduce the query span while respecting load balance and I/O load constraints under replication. To test our model, we enhance the Cassandra NoSQL system to support selective replication and we implement a social network application (a Twitter clone) utilizing our enhanced Cassandra. We conduct experiments on a cloud computing environment (Amazon EC2) to test the developed systems. Comparison of the proposed method with hash- and enhanced graph-based schemes indicate that it significantly improves latency and throughput.

[1]  K. Selçuk Candan,et al.  How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media? , 2010, ICWSM.

[2]  George Karypis,et al.  Multilevel Hypergraph Partitioning , 2003 .

[3]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[4]  Bo Zong,et al.  Towards effective partition management for large graphs , 2012, SIGMOD Conference.

[5]  J. M. Pujol,et al.  Scaling Online Social Networks without Pains , 2009 .

[6]  Qianni Deng,et al.  Differentiating Your Friends for Scaling Online Social Networks , 2012, 2012 IEEE International Conference on Cluster Computing.

[7]  Stefan Jablonski,et al.  NoSQL evaluation: A use case oriented survey , 2011, 2011 International Conference on Cloud and Service Computing.

[8]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2010, SIGCOMM '10.

[9]  Raghu Ramakrishnan,et al.  Feeding frenzy: selectively materializing users' event feeds , 2010, SIGMOD Conference.

[10]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[11]  Vipin Kumar,et al.  Multilevel k-way hypergraph partitioning , 1999, DAC '99.

[12]  David Stein,et al.  Partitioning Social Networks for Fast Retrieval of Time-Dependent Queries , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[13]  Moll Thomae,et al.  Database partitioning strategies for social network data , 2012 .

[14]  Pablo Rodriguez,et al.  Divide and Conquer: Partitioning Online Social Networks , 2009, ArXiv.

[15]  Yan Qiu-yan A Novel Time Streams Prediction Approach Based on Exponential Smoothing , 2010, 2010 Second International Conference on Multimedia and Information Technology.

[16]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[17]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[18]  Carlo Curino,et al.  Lookup Tables: Fine-Grained Partitioning for Distributed Databases , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[19]  Thomas Lengauer,et al.  Combinatorial algorithms for integrated circuit layout , 1990, Applicable theory in computer science.

[20]  A. Bonato,et al.  Graphs and Hypergraphs , 2021, Clustering.

[21]  Hector Garcia-Molina,et al.  Where in the world is my data? , 2011, Proc. VLDB Endow..

[22]  Cevdet Aykanat,et al.  Replicated partitioning for undirected hypergraphs , 2012, J. Parallel Distributed Comput..

[23]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[24]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..