Dynamic Estimation for Medical Data Management in a Cloud Federation

Data sharing is important in the medical domain. Sharing data allows large-scale analysis with many data sources to provide more accurate results (especially in the case of rare diseases with small local datasets). Cloud federations consist in a major progress in sharing medical data stored within different cloud platforms, such as Amazon, Microsoft, Google Cloud, etc. It also enables to access distributed data of mobile patients. The pay-as-you-go model in cloud federations raises an important issue in terms of MultiObjective Query Processing (MOQP) to find a Query Execution Plan according to users preferences, such as response time, money, quality, etc. However, optimizing a query in a cloud federation is complex with increasing heterogeneity and additional variance, especially due to a wide range of communications and pricing models. Indeed, in such a context, it is difficult to provide accurate estimation to make relevant decision. To address this problem, we present Dynamic Regression Algorithm (DREAM), which can provide accurate estimation in a cloud federation with limited historical data. DREAM focuses on reducing the size of historical data while maintaining the estimation accuracy. The proposed algorithm is integrated in Intelligent Resource Scheduler, a solution for heterogeneous databases, to solve MOQP in cloud federations and validate with preliminary experiments on a decision support benchmark (TPC-H benchmark).

[1]  Eli Upfal,et al.  Learning-based Query Performance Modeling and Prediction , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[2]  Subhajit Sidhanta,et al.  OptEx: A Deadline-Aware Cost Optimization Model for Spark , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[3]  Christoph Koch,et al.  Multi-objective parametric query optimization , 2014, SGMD.

[4]  Le Gruenwald,et al.  Weighted Sum Model for Multi-Objective Query Optimization for Mobile-Cloud Database Environments , 2016, EDBT/ICDT Workshops.

[5]  Shafiqur Rehman,et al.  Iterative non-deterministic algorithms in on-shore wind farm design: A brief survey , 2013 .

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[8]  Radu Prodan,et al.  A Multi-objective Approach for Workflow Scheduling in Heterogeneous Environments , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[9]  Verena Kantere,et al.  An efficient multi-objective genetic algorithm for cloud computing: NSGA-G , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[10]  Arthur B. Yeh,et al.  Fundamentals of Probability and Statistics for Engineers , 2005, Technometrics.

[11]  David Corne,et al.  The Pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimisation , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[12]  Peter J. Fleming,et al.  An Overview of Evolutionary Algorithms in Multiobjective Optimization , 1995, Evolutionary Computation.

[13]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[14]  Tim Brecht,et al.  Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[15]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[16]  Jeffrey F. Naughton,et al.  Predicting query execution time: Are optimizer cost models really unusable? , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[17]  Dimitrios Tsoumakos,et al.  IReS: Intelligent, Multi-Engine Resource Scheduler for Big Data Analytics Workflows , 2015, SIGMOD Conference.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation) , 2006 .

[20]  Archana Ganapathi,et al.  Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[21]  George Kollios,et al.  MRShare , 2010, Proc. VLDB Endow..

[22]  Kalyanmoy Deb,et al.  An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints , 2014, IEEE Transactions on Evolutionary Computation.

[23]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[24]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[25]  Calton Pu,et al.  ActiveSLA: a profit-oriented admission control framework for database-as-a-service providers , 2011, SoCC.

[26]  Kalyanmoy Deb,et al.  An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point Based Nondominated Sorting Approach, Part II: Handling Constraints and Extending to an Adaptive Approach , 2014, IEEE Transactions on Evolutionary Computation.