Data Placement for Multi-Tenant Data Federation on the Cloud

Due to privacy concerns of users and law enforcement in data security and privacy, it becomes more and more difficult to share data among organizations. Data federation brings new opportunities to the data-related cooperation among organizations by providing abstract data interfaces. With the development of cloud computing, organizations store data on the cloud to achieve elasticity and scalability for data processing. The existing data placement approaches generally only consider one aspect, which is either execution time or monetary cost, and do not consider data partitioning for hard constraints. In this paper, we propose an approach to enable data processing on the cloud with the data from different organizations. The approach consists of a data federation platform named FedCube and a Lyapunov-based data placement algorithm. FedCube enables data processing on the cloud. We use the data placement algorithm to create a plan in order to partition and store data on the cloud so as to achieve multiple objectives while satisfying the constraints based on a multi-objective cost model. The cost model is composed of two objectives, i.e., reducing monetary cost and execution time. We present an experimental evaluation to show our proposed algorithm significantly reduces the total cost (up to 69.8%) compared with existing approaches.

[1]  Paul Voigt,et al.  The Eu General Data Protection Regulation (Gdpr): A Practical Guide , 2017 .

[2]  Yanjie Fu,et al.  MP2SDA: Multi-Party Parallelized Sparse Discriminant Learning , 2020, ACM Trans. Knowl. Discov. Data.

[3]  Mohit Kumar,et al.  Dynamic load balancing algorithm for balancing the workload among virtual machine in cloud computing , 2017 .

[4]  Haoyi Xiong,et al.  SecureGBM: Secure Multi-Party Gradient Boosting , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[5]  Adam N. Letchford,et al.  Non-convex mixed-integer nonlinear programming: A survey , 2012 .

[6]  Asif Ali Wagan,et al.  Cloud computing and data security threats taxonomy: A review , 2020, J. Intell. Fuzzy Syst..

[7]  Lotfi A. Zadeh,et al.  Optimality and non-scalar-valued performance criteria , 1963 .

[8]  Thanasis Loukopoulos,et al.  On minimizing the resource consumption of cloud applications using process migrations , 2013, J. Parallel Distributed Comput..

[9]  Marios Hadjieleftheriou,et al.  Distributed data placement to minimize communication costs via graph partitioning , 2014, SSDBM '14.

[10]  D. Dou,et al.  Analysis of Collective Response Reveals that COVID-19-Related Activities Start From the End of 2019 in Mainland China , 2020, medRxiv.

[11]  Ashok Srinivasan,et al.  Parallel Low Discrepancy Parameter Sweep for Public Health Policy , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[12]  Patrick Valduriez,et al.  Adaptive Caching for Data-Intensive Scientific Workflows in the Cloud , 2019, DEXA.

[13]  François Charoy,et al.  Customizable Isolation in Transactional Workflow , 2006 .

[14]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[15]  Haoyi Xiong,et al.  An Investigation of Containment Measures Against the COVID-19 Pandemic in Mainland China , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[16]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[17]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[18]  D. Dou,et al.  Understanding the Collective Responses of Populations to the COVID-19 Pandemic in Mainland China , 2020, medRxiv.

[19]  Boris Polyak,et al.  Lyapunov Functions: An Optimization Theory Perspective , 2017 .

[20]  T. Jamil The Rijndael algorithm , 2004, IEEE Potentials.

[21]  Olivier Sentieys,et al.  Controllable QoS for Imprecise Computation Tasks on DVFS Multicores With Time and Energy Constraints , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[22]  Robbert Fokkink,et al.  On Submodular Search and Machine Scheduling , 2016, Math. Oper. Res..

[23]  Nane Kratzke,et al.  A Brief History of Cloud Application Architectures , 2018, Applied Sciences.

[24]  Kwang Mong Sim,et al.  A comparative review of job scheduling for MapReduce , 2011, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems.

[25]  Magdy Bayoumi,et al.  Cost-Efficient Storage for On-Demand Video Streaming on Cloud , 2020, 2020 IEEE 6th World Forum on Internet of Things (WF-IoT).

[26]  J. Hartmanis Computers and Intractability: A Guide to the Theory of NP-Completeness (Michael R. Garey and David S. Johnson) , 1982 .

[27]  Daniel A. Menascé,et al.  The Anatomy of MapReduce Jobs, Scheduling, and Performance Challenges , 2013, Int. CMG Conference.

[28]  Marta Mattoso,et al.  Efficient Scheduling of Scientific Workflows Using Hot Metadata in a Multisite Cloud , 2017, IEEE Transactions on Knowledge and Data Engineering.

[29]  Adam Wierman,et al.  Datum: Managing Data Purchasing and Data Placement in a Geo-Distributed Data Market , 2018, IEEE/ACM Transactions on Networking.

[30]  Jie Wu,et al.  A Fair Task Assignment Strategy for Minimizing Cost in Mobile Crowdsensing , 2020, 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS).

[31]  Li Yan,et al.  An optimized data storage strategy by computational performance and monetary cost with data importance in the cloud , 2017, 2017 IEEE 21st International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[32]  Marta Mattoso,et al.  Multi-objective scheduling of Scientific Workflows in multisite clouds , 2016, Future Gener. Comput. Syst..

[33]  Jianwei Yin,et al.  A Stochastic Control Approach to Maximize Profit on Service Provisioning for Mobile Cloudlet Platforms , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[34]  Brent Waters,et al.  Anonymous Hierarchical Identity-Based Encryption (Without Random Oracles) , 2006, CRYPTO.

[35]  Sunil K. Sarin,et al.  Data sharing in group work , 1986, TOIS.

[36]  Xian-He Sun,et al.  Reevaluating Amdahl's law in the multicore era , 2010, J. Parallel Distributed Comput..

[37]  Walid Saad,et al.  On Data Center Demand Response: A Cloud Federation Approach , 2019, IEEE Access.

[38]  Daniel Yue Zhang,et al.  An Integrated Top-down and Bottom-up Task Allocation Approach in Social Sensing based Edge Computing Systems , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[39]  Gary Anthes,et al.  Security in the cloud , 2010, Commun. ACM.

[40]  Jasbir S. Arora,et al.  Survey of multi-objective optimization methods for engineering , 2004 .

[41]  Peng Ning,et al.  Managing security of virtual machine images in a cloud environment , 2009, CCSW '09.

[42]  Wei Cheng,et al.  Multi-party Sparse Discriminant Learning , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[43]  Fabien Hermenier,et al.  Multi-objective job placement in clusters , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[44]  Andrey N. Belikov,et al.  Target and (Astro-)WISE technologies Data federations and its applications , 2016, Astroinformatics.

[45]  Lúcia Maria de A. Drummond,et al.  Evaluating Grasp-based cloud dimensioning for comparative genomics: A practical approach , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[46]  Antony I. T. Rowstron,et al.  Feeding the Pelican: Using Archival Hard Drives for Cold Storage Racks , 2016, HotStorage.

[47]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[48]  Xiaoying Gan,et al.  Dynamic Task Assignment in Crowdsensing with Location Awareness and Location Diversity , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[49]  Weiyi Zhang,et al.  A secured cost-effective multi-cloud storage in cloud computing , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[50]  Marta Mattoso,et al.  Parallelization of Scientific Workflows in the Cloud , 2014 .