CoShare: A Cost-Effective Data Sharing System for Data Center Networks

Numerous research groups and other organizations collect data from popular data sources such as online social networks. This leads to the problem of data islands, wherein all this data is isolated and lying idly, without any use to the community at large. Using existing centralized solutions such as Drop box to replicate data to all interested parties is prohibitively costly, given the large size of datasets. A practical solution is to use a Peer-to-Peer (P2P) approach to replicate data in a self-organized manner. However, existing P2P approaches focus on minimizing downloading time without taking into account the bandwidth cost. In this paper, we present Co Share, a P2P inspired decentralized cost effective sharing system for data replication. Co Share allows users to specify their requirements on data sharing tasks and maps these requirements into resource requirements for data transfer. Through extensive simulations, we demonstrate that Co Share finds the desirable tradeoffs for a given cost and performance while varying user requirements and request arrival rates.

[1]  Gabriel Antoniu,et al.  Transfer as a Service: Towards a Cost-Effective Model for Multi-site Cloud Data Management , 2014, 2014 IEEE 33rd International Symposium on Reliable Distributed Systems.

[2]  Suman Nath,et al.  SenseWeb: An Infrastructure for Shared Sensing , 2007, IEEE MultiMedia.

[3]  Pablo Rodriguez,et al.  Delay-Tolerant Bulk Data Transfers on the Internet , 2009, IEEE/ACM Transactions on Networking.

[4]  Brian Tierney,et al.  Efficient data transfer protocols for big data , 2012, 2012 IEEE 8th International Conference on E-Science.

[5]  Gabriel Antoniu,et al.  JetStream: enabling high performance event streaming across cloud data-centers , 2014, DEBS '14.

[6]  Nazareno Andrade,et al.  Inter-swarm resource allocation in BitTorrent communities , 2011, 2011 IEEE International Conference on Peer-to-Peer Computing.

[7]  Karl Aberer,et al.  QoS-Based Service Selection and Ranking with Trust and Reputation Management , 2005, OTM Conferences.

[8]  Emin Gün Sirer,et al.  AntFarm: Efficient Content Distribution with Managed Swarms , 2009, NSDI.

[9]  William Chan,et al.  Improving Traffic Locality in BitTorrent via Biased Neighbor Selection , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[10]  Albert G. Greenberg,et al.  Optimizing Cost and Performance in Online Service Provider Networks , 2010, NSDI.

[11]  Xiaoyuan Yang,et al.  Inter-datacenter bulk transfers with netstitcher , 2011 .

[12]  Alex X. Liu,et al.  Multiple bulk data transfers scheduling among datacenters , 2014, Comput. Networks.

[13]  Chen Tian,et al.  Optimizing cost and performance for content multihoming , 2012, SIGCOMM '12.

[14]  Karl Aberer,et al.  Decentralizing the cloud: How can small data centers cooperate? , 2014, 14-th IEEE International Conference on Peer-to-Peer Computing.

[15]  B. Cohen,et al.  Incentives Build Robustness in Bit-Torrent , 2003 .

[16]  Bo Li,et al.  Postcard: Minimizing Costs on Inter-Datacenter Traffic with Store-and-Forward , 2012, 2012 32nd International Conference on Distributed Computing Systems Workshops.

[17]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[18]  Michael Sirivianos,et al.  Inter-datacenter bulk transfers with netstitcher , 2011, SIGCOMM.

[19]  Zongpeng Li,et al.  Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters , 2017, IEEE Transactions on Cloud Computing.

[20]  Bing Zhang,et al.  StorkCloud: data transfer scheduling and optimization as a service , 2013, Science Cloud '13.