Cost-Effective Resource Configurations for Executing Data-Intensive Workloads in Public Clouds

........................................................................................................................................... ii Co-Authorship ................................................................................................................................ iv Dedications ..................................................................................................................................... vi Acknowledgements ........................................................................................................................ vii Statement of Originality ................................................................................................................ viii Chapter 1 : Introduction ................................................................................................................... 1 1.1 Cloud Computing and its Offerings to Large-Scale Data Processing .................................... 1 1.2 Examples of Data Growth in Scientific and Commercial Domains ...................................... 5 1.3 The Need for Workload Management and Resource Provisioning ....................................... 6 1.4 Thesis Contributions .............................................................................................................. 8 1.5 Thesis Statement .................................................................................................................... 9 1.6 Thesis Organization ............................................................................................................. 10 Chapter 2 : Background and State-of-the-Art ................................................................................ 11 2.1 Workload Management Taxonomy ..................................................................................... 11 2.2 Data Processing: Taxonomy and Survey ............................................................................. 14 2.2.1 MapReduce ................................................................................................................... 16 2.2.2 Dataflow-processing ..................................................................................................... 20 2.2.3 Shared-nothing Relational Processing .......................................................................... 24 2.2.4 Stream-processing ......................................................................................................... 28 2.2.5 MR&DB Hybrid ........................................................................................................... 31 2.2.6 Discussion ..................................................................................................................... 35 2.3 Provisioning: Taxonomy & Survey ..................................................................................... 39 2.3.1 Scaling........................................................................................................................... 41 2.3.2 Migration....................................................................................................................... 45 2.3.3 Surge Computing .......................................................................................................... 51 2.3.4 Discussion ..................................................................................................................... 54 2.4 Conclusions .......................................................................................................................... 57 2.4.1 Open Problems .............................................................................................................. 60 Chapter 3 : Overview of Our Approach ......................................................................................... 62 3.1 Problem Statement ............................................................................................................... 62 3.2 Framework ........................................................................................................................... 65 3.3 Evaluation Setup .................................................................................................................. 67 x 3.3.1 Tenant Databases and Request Types for Creating Workloads .................................... 67 3.3.2 Selection of VM Types ................................................................................................. 71 3.4 Outline of the Remaining Thesis ......................................................................................... 72 Chapter 4 : Experiment-Based Performance Models ..................................................................... 74 4.1 Motivation ............................................................................................................................ 74 4.2 Background .......................................................................................................................... 75 4.3 Variables in building a Performance Model ........................................................................ 78 4.4 Building the Performance Model ......................................................................................... 80 4.4.1 Sampling the Space of Request Mixes .......................................................................... 80 4.4.2 Experiment-driven Data Collection .............................................................................. 81 4.4.3 Constructing the Request Mix Model ........................................................................... 82 4.4.4 Determining a suitable number of samples ................................................................... 82 4.4.5 Comparison of Prediction Techniques .......................................................................... 86 4.5 Evaluation ............................................................................................................................ 89 4.5.1 Experiment Setup and Validation Method .................................................................... 89 4.5.2 Data Patterns: Identification and Treatment ................................................................. 91 4.5.2.1 Data Classes ........................................................................................................... 92 4.5.3 Validation Results ......................................................................................................... 94 4.5.3.1 Large VM Type (Optimal MPL=75) ..................................................................... 94 4.5.3.2 Small VM type (Optimal MPL=14) ..................................................................... 100 4.5.3.3 Xlarge VM type (Optimal MPL=115) ................................................................. 101 4.6 Modeling Non-linear Behaviour ........................................................................................ 103 4.7 Conclusions ........................................................................................................................ 107 Chapter 5 : Analytical Cost Model .............................................................................................. 110 5.1 Motivation .......................................................................................................................... 110 5.2 Background ........................................................................................................................ 111 5.3 Different Resource Types and Pricing Schemes in IaaS Clouds ....................................... 112 5.3.1 Resource Types and Sub Types .................................................................................. 113 5.3.2 Pricing Schemes .......................................................................................................... 115 5.4 Cost Model ......................................................................................................................... 116 5.5 Evaluation .......................................................................................................................... 119 5.5.1 Tenants and Workloads ............................................................................................... 120 5.5.2 Cost Model for the Amazon cloud .............................................................................. 121 5.5.3 Experiments ................................................................................................................ 123 xi 5.5.3.1 VM Type .............................................................................................................. 125 5.5.3.2 Workload Mix ...................................................................................................... 126 5.5.3.3 SLA Penalties ....................................................................................................... 127 5.6 Conclusions ........................................................................................................................ 130 Chapter 6 : Heuristic-based Configuration Selection .................................................................. 133 6.1 Motivation .......................................................................................................................... 133 6.2 Background ........................................................................................................................ 134 6.3 Determining a Cost-Effective Configuration ..................................................................... 136 6.3.1 Modifications .............................................................................................................. 137 6.3.2 Search Algorithms ...................................................................................................... 139 6.4 Evaluation ...........................................................................................

[1]  Haifeng Chen,et al.  Resilient workload manager: taming bursty workload of scaling internet applications , 2009, ICAC-INDST '09.

[2]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[3]  A. Brown,et al.  Managing Data-Intensive Workloads in a Cloud , 2011, Grid and Cloud Database Management.

[4]  Rajkumar Buyya,et al.  Cost of Virtual Machine Live Migration in Clouds: A Performance Evaluation , 2009, CloudCom.

[5]  Patrick Martin,et al.  Towards Autonomic Workload Management in DBMSs , 2009, J. Database Manag..

[6]  Naveen Sharma,et al.  Towards autonomic workload provisioning for enterprise Grids and clouds , 2009, 2009 10th IEEE/ACM International Conference on Grid Computing.

[7]  Leonie Kohl,et al.  Fundamental Concepts in the Design of Experiments , 2000 .

[8]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[9]  Rajkumar Buyya,et al.  A cost-benefit analysis of using cloud computing to extend the capacity of clusters , 2010, Cluster Computing.

[10]  Tim Kraska,et al.  Building a database on S3 , 2008, SIGMOD Conference.

[11]  Craig D. Weissman,et al.  The design of the force.com multitenant internet application development platform , 2009, SIGMOD Conference.

[12]  Nicolas Bruno,et al.  SCOPE: parallel databases meet MapReduce , 2012, The VLDB Journal.

[13]  Patrick Martin,et al.  Discovering Indicators for Congestion in DBMSs , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[14]  David J. DeWitt,et al.  Clustera: an integrated computation and data management system , 2008, Proc. VLDB Endow..

[15]  Terence Kelly,et al.  Detecting Performance Anomalies in Global Applications , 2005, WORLDS.

[16]  José Luis Vázquez-Poletti,et al.  Towards building performance models for data-intensive workloads in public clouds , 2013, ICPE '13.

[17]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[18]  Jason Weston,et al.  A user's guide to support vector machines. , 2010, Methods in molecular biology.

[19]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[20]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[21]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[22]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[23]  Vinay Setty,et al.  Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) , 2010, Proc. VLDB Endow..

[24]  Samuel T. Chanson,et al.  Process groups and group communications: classifications and requirements , 1990, Computer.

[25]  Marin Litoiu,et al.  CloudOpt: Multi-goal optimization of application deployments across a cloud , 2011, 2011 7th International Conference on Network and Service Management.

[26]  Kian-Tat Lim,et al.  LSST Data Products: Enabling LSST Science , 2013 .

[27]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[28]  Rajkumar Buyya,et al.  Virtual Machine Provisioning Based on Analytical Performance and QoS in Cloud Computing Environments , 2011, 2011 International Conference on Parallel Processing.

[29]  Ramakrishna Varadarajan,et al.  The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..

[30]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[31]  Jorge-Arnulfo Quiané-Ruiz,et al.  Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[32]  Divyakant Agrawal,et al.  Live Database Migration for Elasticity in a Multitenant Database for Cloud Platforms , 2010 .

[33]  Liang Lin,et al.  Tenzing a SQL implementation on the MapReduce framework , 2011, Proc. VLDB Endow..

[34]  Chetan Gupta,et al.  PQR: Predicting Query Execution Times for Autonomous Workload Management , 2008, 2008 International Conference on Autonomic Computing.

[35]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[36]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[37]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[38]  Jinjun Chen,et al.  A Cost-Effective Mechanism for Cloud Data Reliability Management Based on Proactive Replica Checking , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[40]  Kang G. Shin,et al.  Automated control of multiple virtualized resources , 2009, EuroSys '09.

[41]  Leon Gommans,et al.  Seamless live migration of virtual machines over the MAN/WAN , 2006, Future Gener. Comput. Syst..

[42]  Patrick Martin,et al.  Executing Data-Intensive Workloads in a Cloud , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[43]  Eyal de Lara,et al.  SnowFlock: rapid virtual machine cloning for cloud computing , 2009, EuroSys '09.

[44]  Shivnath Babu,et al.  Predicting completion times of batch query workloads using interaction-aware models and simulation , 2011, EDBT/ICDT '11.

[45]  Marin Litoiu,et al.  Exploring Alternative Approaches to Implement an Elasticity Policy , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[46]  Marin Litoiu,et al.  Feedback-based optimization of a private cloud , 2012, Future Gener. Comput. Syst..

[47]  Rafael Moreno-Vozmediano,et al.  Elastic management of cluster-based services in the cloud , 2009, ACDC '09.

[48]  Radu Prodan,et al.  A survey and taxonomy of infrastructure as a service and web hosting cloud providers , 2009, 2009 10th IEEE/ACM International Conference on Grid Computing.

[49]  Jason W. Osbourne,et al.  Four Assumptions of Multiple Regression That Researchers Should Always Test. , 2002 .

[50]  C. Ireland Fundamental concepts in the design of experiments , 1964 .

[51]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[52]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[53]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[54]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[55]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[56]  Ashraf Aboulnaga,et al.  ReStore: Reusing Results of MapReduce Jobs , 2012, Proc. VLDB Endow..

[57]  Patrick Martin,et al.  Utility Function-based Workload Management for DBMSs , 2011 .

[58]  Kamesh Munagala,et al.  Modeling and exploiting query interactions in database systems , 2008, CIKM '08.

[59]  Divyakant Agrawal,et al.  Zephyr: live migration in shared nothing databases for elastic cloud platforms , 2011, SIGMOD '11.

[60]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[61]  José Luis Vázquez-Poletti,et al.  Estimating resource costs of data-intensive workloads in public clouds , 2012, MGC '12.

[62]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[63]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[64]  José Luis Vázquez-Poletti,et al.  A Model for Efficient Onboard Actualization of an Instrumental Cyclogram for the Mars MetNet Mission on a Public Cloud Infrastructure , 2010, PARA.

[65]  Gagan Agrawal,et al.  Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[66]  Qi Zhang,et al.  R-Capriccio: A Capacity Planning and Anomaly Detection Tool for Enterprise Services with Live Workloads , 2007, Middleware.

[67]  Garret Swart,et al.  Oracle in-database hadoop: when mapreduce meets RDBMS , 2012, SIGMOD Conference.

[68]  Yingwei Luo,et al.  Live and incremental whole-system migration of virtual machines using block-bitmap , 2008, 2008 IEEE International Conference on Cluster Computing.

[69]  Anja Feldmann,et al.  Live wide-area migration of virtual machines including local persistent state , 2007, VEE '07.

[70]  Kimmo E. E. Raatikainen,et al.  Cluster analysis and workload classification , 1993, PERV.

[71]  Junichi Suzuki,et al.  Queuing Theoretic and Evolutionary Deployment Optimization with Probabilistic SLAs for Service Oriented Clouds , 2009, 2009 Congress on Services - I.

[72]  Sudipto Das,et al.  Who's Driving this Cloud? Towards Efficient Migration for Elastic and Autonomic Multitenant Databases , 2010 .

[73]  Miron Livny,et al.  Adaptive Scheduling for Master-Worker Applications on the Computational Grid , 2000, GRID.

[74]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[75]  Archana Ganapathi,et al.  Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[76]  Jian Pei,et al.  2012- Data Mining. Concepts and Techniques, 3rd Edition.pdf , 2012 .

[77]  Carlo Curino,et al.  Workload-aware database monitoring and consolidation , 2011, SIGMOD '11.

[78]  Marin Litoiu,et al.  Fast scalable optimization to configure service systems having cost and quality of service constraints , 2009, ICAC '09.

[79]  Matei Ripeanu,et al.  Amazon S3 for science grids: a viable solution? , 2008, DADC '08.

[80]  Peter Bumbulis,et al.  Automatic tuning of the multiprogramming level in Sybase SQL Anywhere , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[81]  Eyal de Lara,et al.  SnowFlock: Virtual Machine Cloning as a First-Class Cloud Primitive , 2011, TOCS.

[82]  Linna Du,et al.  Pricing and Resource Allocation in a Cloud Computing Market , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[83]  GhemawatSanjay,et al.  The Google file system , 2003 .

[84]  Alexander S. Szalay,et al.  Data-Intensive Computing in the 21st Century , 2008, Computer.

[85]  Ian Watson,et al.  The Manchester prototype dataflow computer , 1985, CACM.

[86]  Alex Delis,et al.  Flexible use of cloud resources through profit maximization and price discrimination , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[87]  Pascal Poupart,et al.  A bayesian approach to online performance modeling for database appliances using gaussian models , 2011, ICAC '11.

[88]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[89]  Marin Litoiu,et al.  Designing Process Replication and Activation: A Quantitative Approach , 2000, IEEE Trans. Software Eng..

[90]  Edward Walker,et al.  Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment , 2006, 2006 IEEE Challenges of Large Applications in Distributed Environments.

[91]  Marty Humphrey,et al.  A Model and Decision Procedure for Data Storage in Cloud Computing , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[92]  Hui Li,et al.  SLA-driven planning and optimization of enterprise applications , 2010, WOSP/SIPEW '10.

[93]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[94]  Patrick Martin,et al.  Autonomic workload execution control using throttling , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[95]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[96]  Steven Hand,et al.  The Seven Deadly Sins of Cloud Computing Research , 2012, HotCloud.

[97]  Shivnath Babu,et al.  Query interactions in database workloads , 2009, DBTest '09.

[98]  Eduardo Serrano,et al.  LSST: From Science Drivers to Reference Design and Anticipated Data Products , 2008, The Astrophysical Journal.

[99]  Serge Abiteboul,et al.  Searching Shared Content in Communities with the Data Ring , 2009, IEEE Data Eng. Bull..

[100]  Christine Morin,et al.  Shrinker: efficient live migration of virtual clusters over wide area networks , 2013, Concurr. Comput. Pract. Exp..

[101]  Paolo Avesani,et al.  Controversial Users Demand Local Trust Metrics: An Experimental Study on Epinions.com Community , 2005, AAAI.

[102]  Borja Sotomayor,et al.  Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[103]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[104]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[105]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[106]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[107]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[108]  Rajeev Gandhi,et al.  An Analysis of Traces from a Production MapReduce Cluster , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[109]  Jan Broeckhove,et al.  Cost-Optimal Scheduling in Hybrid IaaS Clouds for Deadline Constrained Workloads , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[110]  C. Murray Woodside,et al.  Using regression splines for software performance analysis , 2000, WOSP '00.

[111]  Andreas Bergen,et al.  Client bandwidth: The forgotten metric of online storage providers , 2011, Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[112]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[113]  Gregory R. Ganger,et al.  Towards Self-Predicting Systems: What If You Could Ask "What-If"? , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[114]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[115]  G. Bruce Berriman,et al.  How Will Astronomy Archives Survive the Data Tsunami? , 2011, ACM Queue.

[116]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[117]  Martin L. Kersten,et al.  MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[118]  Bu-Sung Lee,et al.  Optimal virtual machine placement across multiple cloud providers , 2009, 2009 IEEE Asia-Pacific Services Computing Conference (APSCC).

[119]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[120]  Patrick Martin,et al.  Integrating MapReduce and RDBMSs , 2010, CASCON.

[121]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[122]  Marin Litoiu,et al.  Performance model driven QoS guarantees and optimization in clouds , 2009, 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing.

[123]  L. Nelson Data, data everywhere. , 1997, Critical care medicine.

[124]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[125]  Tachen Leo Lo,et al.  The Evolution of Workload Management in the Data Processing Industry: A Survey , 1986, FJCC.

[126]  Jeffrey S. Chase,et al.  Automated control for elastic storage , 2010, ICAC '10.

[127]  Tejaswi Redkar,et al.  Windows Azure Platform , 2010 .

[128]  Tim Brecht,et al.  Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[129]  Gerhard Weikum,et al.  Self-tuning Database Technology and Information Services: from Wishful Thinking to Viable Engineering , 2002, VLDB.

[130]  Kang G. Shin,et al.  Adaptive control of virtualized resources in utility computing environments , 2007, EuroSys '07.

[131]  Marin Litoiu,et al.  Partitioning applications for hybrid and federated clouds , 2012, CASCON.

[132]  Anne Kao,et al.  Natural Language Processing and Text Mining , 2006 .

[133]  Ralph Duncan A survey of parallel computer architectures , 1990, Computer.

[134]  Rajkumar Buyya,et al.  Pricing Cloud Compute Commodities: A Novel Financial Economic Model , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[135]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[136]  José Luis Vázquez-Poletti,et al.  Provisioning data analytic workloads in a cloud , 2013, Future Gener. Comput. Syst..

[137]  Robert L. Grossman,et al.  Sector and Sphere: the design and implementation of a high-performance data cloud , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[138]  E. Ip,et al.  High Capacity/Spectral Efficiency 101.7-Tb/s WDM Transmission Using PDM-128QAM-OFDM Over 165-km SSMF Within C- and L-Bands , 2012, Journal of Lightwave Technology.

[139]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[140]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[141]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[142]  Qian Zhu,et al.  Resource Provisioning with Budget Constraints for Adaptive Applications in Cloud Environments , 2012, IEEE Trans. Serv. Comput..

[143]  Anthony M. Middleton Data-Intensive Technologies for Cloud Computing , 2010, Handbook of Cloud Computing.

[144]  Haifeng Chen,et al.  Intelligent Workload Factoring for a Hybrid Cloud Computing Model , 2009, 2009 Congress on Services - I.

[145]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[146]  Robert L. Grossman,et al.  On the Varieties of Clouds for Data Intensive Computing , 2009, IEEE Data Eng. Bull..

[147]  Peter Jenni,et al.  The ATLAS experiment , 2014, Scholarpedia.

[148]  Bruce Margony The Sloan Digital Sky Survey , 1999, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[149]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[150]  Muli Ben-Yehuda,et al.  Deconstructing Amazon EC2 Spot Instance Pricing , 2011, CloudCom.

[151]  Ian T. Foster,et al.  Managed GridFTP , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[152]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[153]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[154]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..