论文信息 - Cost-Effective Resource Configurations for Executing Data-Intensive Workloads in Public Clouds

Cost-Effective Resource Configurations for Executing Data-Intensive Workloads in Public Clouds

........................................................................................................................................... ii Co-Authorship ................................................................................................................................ iv Dedications ..................................................................................................................................... vi Acknowledgements ........................................................................................................................ vii Statement of Originality ................................................................................................................ viii Chapter 1 : Introduction ................................................................................................................... 1 1.1 Cloud Computing and its Offerings to Large-Scale Data Processing .................................... 1 1.2 Examples of Data Growth in Scientific and Commercial Domains ...................................... 5 1.3 The Need for Workload Management and Resource Provisioning ....................................... 6 1.4 Thesis Contributions .............................................................................................................. 8 1.5 Thesis Statement .................................................................................................................... 9 1.6 Thesis Organization ............................................................................................................. 10 Chapter 2 : Background and State-of-the-Art ................................................................................ 11 2.1 Workload Management Taxonomy ..................................................................................... 11 2.2 Data Processing: Taxonomy and Survey ............................................................................. 14 2.2.1 MapReduce ................................................................................................................... 16 2.2.2 Dataflow-processing ..................................................................................................... 20 2.2.3 Shared-nothing Relational Processing .......................................................................... 24 2.2.4 Stream-processing ......................................................................................................... 28 2.2.5 MR&DB Hybrid ........................................................................................................... 31 2.2.6 Discussion ..................................................................................................................... 35 2.3 Provisioning: Taxonomy & Survey ..................................................................................... 39 2.3.1 Scaling........................................................................................................................... 41 2.3.2 Migration....................................................................................................................... 45 2.3.3 Surge Computing .......................................................................................................... 51 2.3.4 Discussion ..................................................................................................................... 54 2.4 Conclusions .......................................................................................................................... 57 2.4.1 Open Problems .............................................................................................................. 60 Chapter 3 : Overview of Our Approach ......................................................................................... 62 3.1 Problem Statement ............................................................................................................... 62 3.2 Framework ........................................................................................................................... 65 3.3 Evaluation Setup .................................................................................................................. 67 x 3.3.1 Tenant Databases and Request Types for Creating Workloads .................................... 67 3.3.2 Selection of VM Types ................................................................................................. 71 3.4 Outline of the Remaining Thesis ......................................................................................... 72 Chapter 4 : Experiment-Based Performance Models ..................................................................... 74 4.1 Motivation ............................................................................................................................ 74 4.2 Background .......................................................................................................................... 75 4.3 Variables in building a Performance Model ........................................................................ 78 4.4 Building the Performance Model ......................................................................................... 80 4.4.1 Sampling the Space of Request Mixes .......................................................................... 80 4.4.2 Experiment-driven Data Collection .............................................................................. 81 4.4.3 Constructing the Request Mix Model ........................................................................... 82 4.4.4 Determining a suitable number of samples ................................................................... 82 4.4.5 Comparison of Prediction Techniques .......................................................................... 86 4.5 Evaluation ............................................................................................................................ 89 4.5.1 Experiment Setup and Validation Method .................................................................... 89 4.5.2 Data Patterns: Identification and Treatment ................................................................. 91 4.5.2.1 Data Classes ........................................................................................................... 92 4.5.3 Validation Results ......................................................................................................... 94 4.5.3.1 Large VM Type (Optimal MPL=75) ..................................................................... 94 4.5.3.2 Small VM type (Optimal MPL=14) ..................................................................... 100 4.5.3.3 Xlarge VM type (Optimal MPL=115) ................................................................. 101 4.6 Modeling Non-linear Behaviour ........................................................................................ 103 4.7 Conclusions ........................................................................................................................ 107 Chapter 5 : Analytical Cost Model .............................................................................................. 110 5.1 Motivation .......................................................................................................................... 110 5.2 Background ........................................................................................................................ 111 5.3 Different Resource Types and Pricing Schemes in IaaS Clouds ....................................... 112 5.3.1 Resource Types and Sub Types .................................................................................. 113 5.3.2 Pricing Schemes .......................................................................................................... 115 5.4 Cost Model ......................................................................................................................... 116 5.5 Evaluation .......................................................................................................................... 119 5.5.1 Tenants and Workloads ............................................................................................... 120 5.5.2 Cost Model for the Amazon cloud .............................................................................. 121 5.5.3 Experiments ................................................................................................................ 123 xi 5.5.3.1 VM Type .............................................................................................................. 125 5.5.3.2 Workload Mix ...................................................................................................... 126 5.5.3.3 SLA Penalties ....................................................................................................... 127 5.6 Conclusions ........................................................................................................................ 130 Chapter 6 : Heuristic-based Configuration Selection .................................................................. 133 6.1 Motivation .......................................................................................................................... 133 6.2 Background ........................................................................................................................ 134 6.3 Determining a Cost-Effective Configuration ..................................................................... 136 6.3.1 Modifications .............................................................................................................. 137 6.3.2 Search Algorithms ...................................................................................................... 139 6.4 Evaluation ...........................................................................................

Rizwan Mian | Rizwan Mian

[1] Haifeng Chen,et al. Resilient workload manager: taming bursty workload of scaling internet applications , 2009, ICAC-INDST '09.

[2] Douglas Thain,et al. Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[3] A. Brown,et al. Managing Data-Intensive Workloads in a Cloud , 2011, Grid and Cloud Database Management.

[4] Rajkumar Buyya,et al. Cost of Virtual Machine Live Migration in Clouds: A Performance Evaluation , 2009, CloudCom.

[5] Patrick Martin,et al. Towards Autonomic Workload Management in DBMSs , 2009, J. Database Manag..

[6] Naveen Sharma,et al. Towards autonomic workload provisioning for enterprise Grids and clouds , 2009, 2009 10th IEEE/ACM International Conference on Grid Computing.

[7] Leonie Kohl,et al. Fundamental Concepts in the Design of Experiments , 2000 .

[8] David J. DeWitt,et al. Parallel database systems: the future of high performance database systems , 1992, CACM.

[9] Rajkumar Buyya,et al. A cost-benefit analysis of using cloud computing to extend the capacity of clusters , 2010, Cluster Computing.

[10] Tim Kraska,et al. Building a database on S3 , 2008, SIGMOD Conference.

[11] Craig D. Weissman,et al. The design of the force.com multitenant internet application development platform , 2009, SIGMOD Conference.

[12] Nicolas Bruno,et al. SCOPE: parallel databases meet MapReduce , 2012, The VLDB Journal.

[13] Patrick Martin,et al. Discovering Indicators for Congestion in DBMSs , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[14] David J. DeWitt,et al. Clustera: an integrated computation and data management system , 2008, Proc. VLDB Endow..

[15] Terence Kelly,et al. Detecting Performance Anomalies in Global Applications , 2005, WORLDS.

[16] José Luis Vázquez-Poletti,et al. Towards building performance models for data-intensive workloads in public clouds , 2013, ICPE '13.

[17] Zheng Shao,et al. Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[18] Jason Weston,et al. A user's guide to support vector machines. , 2010, Methods in molecular biology.

[19] Yong Zhao,et al. Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[20] Jacob Cohen,et al. Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[21] Randy H. Katz,et al. Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[22] Bernhard Schölkopf,et al. New Support Vector Algorithms , 2000, Neural Computation.

[23] Vinay Setty,et al. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) , 2010, Proc. VLDB Endow..

[24] Samuel T. Chanson,et al. Process groups and group communications: classifications and requirements , 1990, Computer.

[25] Marin Litoiu,et al. CloudOpt: Multi-goal optimization of application deployments across a cloud , 2011, 2011 7th International Conference on Network and Service Management.

[26] Kian-Tat Lim,et al. LSST Data Products: Enabling LSST Science , 2013 .

[27] Carlo Curino,et al. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[28] Rajkumar Buyya,et al. Virtual Machine Provisioning Based on Analytical Performance and QoS in Cloud Computing Environments , 2011, 2011 International Conference on Parallel Processing.

[29] Ramakrishna Varadarajan,et al. The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..

[30] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[31] Jorge-Arnulfo Quiané-Ruiz,et al. Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[32] Divyakant Agrawal,et al. Live Database Migration for Elasticity in a Multitenant Database for Cloud Platforms , 2010 .

[33] Liang Lin,et al. Tenzing a SQL implementation on the MapReduce framework , 2011, Proc. VLDB Endow..

[34] Chetan Gupta,et al. PQR: Predicting Query Execution Times for Autonomous Workload Management , 2008, 2008 International Conference on Autonomic Computing.

[35] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[36] Pete Wyckoff,et al. Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[37] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[38] Jinjun Chen,et al. A Cost-Effective Mechanism for Cloud Data Reliability Management Based on Proactive Replica Checking , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[40] Kang G. Shin,et al. Automated control of multiple virtualized resources , 2009, EuroSys '09.

[41] Leon Gommans,et al. Seamless live migration of virtual machines over the MAN/WAN , 2006, Future Gener. Comput. Syst..

[42] Patrick Martin,et al. Executing Data-Intensive Workloads in a Cloud , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[43] Eyal de Lara,et al. SnowFlock: rapid virtual machine cloning for cloud computing , 2009, EuroSys '09.

[44] Shivnath Babu,et al. Predicting completion times of batch query workloads using interaction-aware models and simulation , 2011, EDBT/ICDT '11.

[45] Marin Litoiu,et al. Exploring Alternative Approaches to Implement an Elasticity Policy , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[46] Marin Litoiu,et al. Feedback-based optimization of a private cloud , 2012, Future Gener. Comput. Syst..

[47] Rafael Moreno-Vozmediano,et al. Elastic management of cluster-based services in the cloud , 2009, ACDC '09.

[48] Radu Prodan,et al. A survey and taxonomy of infrastructure as a service and web hosting cloud providers , 2009, 2009 10th IEEE/ACM International Conference on Grid Computing.

[49] Jason W. Osbourne,et al. Four Assumptions of Multiple Regression That Researchers Should Always Test. , 2002 .

[50] C. Ireland. Fundamental concepts in the design of experiments , 1964 .

[51] Howard Gobioff,et al. The Google file system , 2003, SOSP '03.

[52] Andrew W. Moore,et al. X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[53] Andrew Warfield,et al. Live migration of virtual machines , 2005, NSDI.

[54] Michael Stonebraker,et al. A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[55] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[56] Ashraf Aboulnaga,et al. ReStore: Reusing Results of MapReduce Jobs , 2012, Proc. VLDB Endow..

[57] Patrick Martin,et al. Utility Function-based Workload Management for DBMSs , 2011 .

[58] Kamesh Munagala,et al. Modeling and exploiting query interactions in database systems , 2008, CIKM '08.

[59] Divyakant Agrawal,et al. Zephyr: live migration in shared nothing databases for elastic cloud platforms , 2011, SIGMOD '11.

[60] Beng Chin Ooi,et al. The performance of MapReduce , 2010, Proc. VLDB Endow..

[61] José Luis Vázquez-Poletti,et al. Estimating resource costs of data-intensive workloads in public clouds , 2012, MGC '12.

[62] Adam Silberstein,et al. Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[63] Kristina Chodorow,et al. MongoDB: The Definitive Guide , 2010 .

[64] José Luis Vázquez-Poletti,et al. A Model for Efficient Onboard Actualization of an Instrumental Cyclogram for the Mars MetNet Mission on a Public Cloud Infrastructure , 2010, PARA.

[65] Gagan Agrawal,et al. Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[66] Qi Zhang,et al. R-Capriccio: A Capacity Planning and Anomaly Detection Tool for Enterprise Services with Live Workloads , 2007, Middleware.

[67] Garret Swart,et al. Oracle in-database hadoop: when mapreduce meets RDBMS , 2012, SIGMOD Conference.

[68] Yingwei Luo,et al. Live and incremental whole-system migration of virtual machines using block-bitmap , 2008, 2008 IEEE International Conference on Cluster Computing.

[69] Anja Feldmann,et al. Live wide-area migration of virtual machines including local persistent state , 2007, VEE '07.

[70] Kimmo E. E. Raatikainen,et al. Cluster analysis and workload classification , 1993, PERV.

[71] Junichi Suzuki,et al. Queuing Theoretic and Evolutionary Deployment Optimization with Probabilistic SLAs for Service Oriented Clouds , 2009, 2009 Congress on Services - I.

[72] Sudipto Das,et al. Who's Driving this Cloud? Towards Efficient Migration for Elastic and Autonomic Multitenant Databases , 2010 .

[73] Miron Livny,et al. Adaptive Scheduling for Master-Worker Applications on the Computational Grid , 2000, GRID.

[74] Ivor W. Tsang,et al. Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[75] Archana Ganapathi,et al. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[76] Jian Pei,et al. 2012- Data Mining. Concepts and Techniques, 3rd Edition.pdf , 2012 .

[77] Carlo Curino,et al. Workload-aware database monitoring and consolidation , 2011, SIGMOD '11.

[78] Marin Litoiu,et al. Fast scalable optimization to configure service systems having cost and quality of service constraints , 2009, ICAC '09.

[79] Matei Ripeanu,et al. Amazon S3 for science grids: a viable solution? , 2008, DADC '08.

[80] Peter Bumbulis,et al. Automatic tuning of the multiprogramming level in Sybase SQL Anywhere , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[81] Eyal de Lara,et al. SnowFlock: Virtual Machine Cloning as a First-Class Cloud Primitive , 2011, TOCS.

[82] Linna Du,et al. Pricing and Resource Allocation in a Cloud Computing Market , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[83] GhemawatSanjay,et al. The Google file system , 2003 .

[84] Alexander S. Szalay,et al. Data-Intensive Computing in the 21st Century , 2008, Computer.

[85] Ian Watson,et al. The Manchester prototype dataflow computer , 1985, CACM.

[86] Alex Delis,et al. Flexible use of cloud resources through profit maximization and price discrimination , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[87] Pascal Poupart,et al. A bayesian approach to online performance modeling for database appliances using gaussian models , 2011, ICAC '11.

[88] Michael Stonebraker,et al. MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[89] Marin Litoiu,et al. Designing Process Replication and Activation: A Quantitative Approach , 2000, IEEE Trans. Software Eng..

[90] Edward Walker,et al. Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment , 2006, 2006 IEEE Challenges of Large Applications in Distributed Environments.

[91] Marty Humphrey,et al. A Model and Decision Procedure for Data Storage in Cloud Computing , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[92] Hui Li,et al. SLA-driven planning and optimization of enterprise applications , 2010, WOSP/SIPEW '10.

[93] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[94] Patrick Martin,et al. Autonomic workload execution control using throttling , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[95] Sanjay Ghemawat,et al. MapReduce: a flexible data processing tool , 2010, CACM.

[96] Steven Hand,et al. The Seven Deadly Sins of Cloud Computing Research , 2012, HotCloud.

[97] Shivnath Babu,et al. Query interactions in database workloads , 2009, DBTest '09.

[98] Eduardo Serrano,et al. LSST: From Science Drivers to Reference Design and Anticipated Data Products , 2008, The Astrophysical Journal.

[99] Serge Abiteboul,et al. Searching Shared Content in Communities with the Data Ring , 2009, IEEE Data Eng. Bull..

[100] Christine Morin,et al. Shrinker: efficient live migration of virtual clusters over wide area networks , 2013, Concurr. Comput. Pract. Exp..

[101] Paolo Avesani,et al. Controversial Users Demand Local Trust Metrics: An Experimental Study on Epinions.com Community , 2005, AAAI.

[102] Borja Sotomayor,et al. Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[103] Randy H. Katz,et al. A view of cloud computing , 2010, CACM.

[104] Jingren Zhou,et al. SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[105] Ronald L. Rivest,et al. Introduction to Algorithms, Second Edition , 2001 .

[106] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[107] Divyakant Agrawal,et al. Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[108] Rajeev Gandhi,et al. An Analysis of Traces from a Production MapReduce Cluster , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[109] Jan Broeckhove,et al. Cost-Optimal Scheduling in Hybrid IaaS Clouds for Deadline Constrained Workloads , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[110] C. Murray Woodside,et al. Using regression splines for software performance analysis , 2000, WOSP '00.

[111] Andreas Bergen,et al. Client bandwidth: The forgotten metric of online storage providers , 2011, Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[112] Philip S. Yu,et al. Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[113] Gregory R. Ganger,et al. Towards Self-Predicting Systems: What If You Could Ask "What-If"? , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[114] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[115] G. Bruce Berriman,et al. How Will Astronomy Archives Survive the Data Tsunami? , 2011, ACM Queue.

[116] William E. Allcock,et al. The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[117] Martin L. Kersten,et al. MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[118] Bu-Sung Lee,et al. Optimal virtual machine placement across multiple cloud providers , 2009, 2009 IEEE Asia-Pacific Services Computing Conference (APSCC).

[119] Tom White,et al. Hadoop: The Definitive Guide , 2009 .

[120] Patrick Martin,et al. Integrating MapReduce and RDBMSs , 2010, CASCON.

[121] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[122] Marin Litoiu,et al. Performance model driven QoS guarantees and optimization in clouds , 2009, 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing.

[123] L. Nelson. Data, data everywhere. , 1997, Critical care medicine.

[124] Abraham Silberschatz,et al. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[125] Tachen Leo Lo,et al. The Evolution of Workload Management in the Data Processing Industry: A Survey , 1986, FJCC.

[126] Jeffrey S. Chase,et al. Automated control for elastic storage , 2010, ICAC '10.

[127] Tejaswi Redkar,et al. Windows Azure Platform , 2010 .

[128] Tim Brecht,et al. Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[129] Gerhard Weikum,et al. Self-tuning Database Technology and Information Services: from Wishful Thinking to Viable Engineering , 2002, VLDB.

[130] Kang G. Shin,et al. Adaptive control of virtualized resources in utility computing environments , 2007, EuroSys '07.

[131] Marin Litoiu,et al. Partitioning applications for hybrid and federated clouds , 2012, CASCON.

[132] Anne Kao,et al. Natural Language Processing and Text Mining , 2006 .

[133] Ralph Duncan. A survey of parallel computer architectures , 1990, Computer.

[134] Rajkumar Buyya,et al. Pricing Cloud Compute Commodities: A Novel Financial Economic Model , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[135] Kavitha Ranganathan,et al. Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[136] José Luis Vázquez-Poletti,et al. Provisioning data analytic workloads in a cloud , 2013, Future Gener. Comput. Syst..

[137] Robert L. Grossman,et al. Sector and Sphere: the design and implementation of a high-performance data cloud , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[138] E. Ip,et al. High Capacity/Spectral Efficiency 101.7-Tb/s WDM Transmission Using PDM-128QAM-OFDM Over 165-km SSMF Within C- and L-Bands , 2012, Journal of Lightwave Technology.

[139] Chih-Jen Lin,et al. A Practical Guide to Support Vector Classication , 2008 .

[140] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[141] Daniel J. Abadi,et al. Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[142] Qian Zhu,et al. Resource Provisioning with Budget Constraints for Adaptive Applications in Cloud Environments , 2012, IEEE Trans. Serv. Comput..

[143] Anthony M. Middleton. Data-Intensive Technologies for Cloud Computing , 2010, Handbook of Cloud Computing.

[144] Haifeng Chen,et al. Intelligent Workload Factoring for a Hybrid Cloud Computing Model , 2009, 2009 Congress on Services - I.

[145] Prashant Malik,et al. Cassandra: a decentralized structured storage system , 2010, OPSR.

[146] Robert L. Grossman,et al. On the Varieties of Clouds for Data Intensive Computing , 2009, IEEE Data Eng. Bull..

[147] Peter Jenni,et al. The ATLAS experiment , 2014, Scholarpedia.

[148] Bruce Margony. The Sloan Digital Sky Survey , 1999, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[149] R. Suganya,et al. Data Mining Concepts and Techniques , 2010 .

[150] Muli Ben-Yehuda,et al. Deconstructing Amazon EC2 Spot Instance Pricing , 2011, CloudCom.

[151] Ian T. Foster,et al. Managed GridFTP , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[152] Randy H. Katz,et al. Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[153] Ravi Kumar,et al. Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[154] Carlo Curino,et al. Schism , 2010, Proc. VLDB Endow..