A Framework of Hypergraph-Based Data Placement Among Geo-Distributed Datacenters

Data-intensive applications need to address the problem of properly placing the set of data items in geo-distributed storage nodes. Traditional techniques use the hashing method to achieve the load balance among nodes such as those used in Hadoop and Cassandra, but are not efficient for the requests reading multiple data items in one transaction, especially when the source locations of requests are also distributed. Some recent papers proposed the managed data placement schemes for online social networks, but have a limited scope of applications due to their focuses. We propose a general hypergraph-based data placement framework, which considers both the performance metrics related to the co-location of associated data and those related to the exact location of fulfilling each requested data item. In the framework, we present the methods to convert the optimization objectives into hypergraph models and employ a hypergraph partitioning to efficiently partition the set of data items and place them in distributed nodes. Further, we extend the scheme into replica placement where we need to find multiple locations to place the replicas of the same data item. Through extensive experiments based on trace-based datasets, we evaluate the performance of the proposed framework and demonstrate its effectiveness.

[1]  Baochun Li,et al.  Joint request mapping and response routing for geo-distributed cloud services , 2013, 2013 Proceedings IEEE INFOCOM.

[2]  Jussi Kangasharju,et al.  Object replication strategies in content distribution networks , 2002, Comput. Commun..

[3]  Ethan Katz-Bassett,et al.  SPANStore: cost-effective geo-replicated storage spanning multiple cloud services , 2013, SOSP.

[4]  Jianping Pan,et al.  Location-aware associated data placement for geo-distributed data-intensive applications , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[5]  Hanoch Levy,et al.  Resource placement and assignment in distributed network topologies , 2013, 2013 Proceedings IEEE INFOCOM.

[6]  Lachlan L. H. Andrew,et al.  Greening Geographical Load Balancing , 2015, IEEE/ACM Transactions on Networking.

[7]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[8]  Marcus B. Perry,et al.  The Exponentially Weighted Moving Average , 2010 .

[9]  Jian Huang,et al.  Community based effective social video contents placement in cloud centric CDN network , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[10]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2010, SIGCOMM '10.

[11]  Abdul Quamar,et al.  SWORD: scalable workload-aware data placement for transactional workloads , 2013, EDBT '13.

[12]  Lei Ying,et al.  Map task scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality , 2013, INFOCOM.

[13]  Anne-Marie Kermarrec,et al.  Content and geographical locality in user-generated content sharing systems , 2012, NOSSDAV '12.

[14]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[15]  Alec Wolman,et al.  Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.

[16]  Mohit Tawarmalani,et al.  Performance Sensitive Replication in Geo-distributed Cloud Datastores , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[17]  Adrian Ramirez-Nafarrate,et al.  Collaborative Agents for Distributed Load Management in Cloud Data Centers Using Live Migration of Virtual Machines , 2015, IEEE Transactions on Services Computing.

[18]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[19]  Fang Hao,et al.  Unreeling netflix: Understanding and improving multi-CDN movie delivery , 2012, 2012 Proceedings IEEE INFOCOM.

[20]  Vijay Erramilli,et al.  Social-Aware Replication in Geo-Diverse Online Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[21]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[22]  George Pallis,et al.  Content Delivery Networks: Status and Trends , 2003, IEEE Internet Comput..

[23]  Haiying Shen,et al.  Selective Data Replication for Online Social Networks with Distributed Datacenters , 2013, IEEE Transactions on Parallel and Distributed Systems.

[24]  Jianliang Xu,et al.  On replica placement for QoS-aware content distribution , 2004, IEEE INFOCOM 2004.

[25]  Jun Li,et al.  Multi-objective data placement for multi-cloud socially aware services , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[26]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: applications in VLSI domain , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[27]  Lijuan Wang,et al.  Multi-Phase Ant Colony System for Multi-Party Data-Intensive Service Provision , 2016, IEEE Transactions on Services Computing.

[28]  Ümit V. Çatalyürek,et al.  PaToH: Partitioning Tool for Hypergraphs , 1999 .

[29]  Yuanyuan Tian,et al.  CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop , 2011, Proc. VLDB Endow..

[30]  Marios Hadjieleftheriou,et al.  Distributed data placement to minimize communication costs via graph partitioning , 2014, SSDBM '14.

[31]  Berkant Barla Cambazoglu,et al.  Document replication strategies for geographically distributed web search engines , 2013, Inf. Process. Manag..

[32]  Yitzhak Birk,et al.  Replicate and Bundle (RnB) -- A Mechanism for Relieving Bottlenecks in Data Centers , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[33]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[34]  Cevdet Aykanat,et al.  Temporal Workload-Aware Replicated Partitioning for Social Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.