CloudFlow: A data-aware programming model for cloud workflow applications on modern HPC systems

NPRP grant # 09-1116-1-172 from the Qatar National Research Fund (a member of Qatar Foundation). Ministry of Science and Technology of China under National 973 Basic Research Program (Grant No. 2013CB228206), National Natural Science Foundation of China (Grant Nos. 61472200 and 61233016).

[1]  Fan Zhang,et al.  Performance Variations in Resource Scaling for MapReduce Applications on Private and Public Clouds , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[2]  Ashraf Aboulnaga,et al.  ReStore: Reusing Results of MapReduce Jobs , 2012, Proc. VLDB Endow..

[3]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[4]  Songting Chen,et al.  Cheetah , 2010, Proc. VLDB Endow..

[5]  Ian Foster,et al.  Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers , 2009, HiPC 2009.

[6]  Samee Ullah Khan,et al.  A goal programming based energy efficient resource allocation in data centers , 2012, The Journal of Supercomputing.

[7]  Fan Zhang,et al.  Cluster-Size Scaling and MapReduce Execution Times , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[8]  Rajkumar Buyya,et al.  High Performance Mass Storage and Parallel I/O: Technologies and Applications , 2001 .

[9]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[10]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[11]  Rajkumar Buyya,et al.  High Performance Cluster Computing , 1999 .

[12]  Jeffrey D. Ullman,et al.  Optimizing Multiway Joins in a Map-Reduce Environment , 2011, IEEE Transactions on Knowledge and Data Engineering.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Bin Cong,et al.  Scalable Parallel Computing: Technology, Architecture, Programming , 1999, Scalable Comput. Pract. Exp..

[15]  Michael Isard,et al.  DryadInc: Reusing Work in Large-scale Computations , 2009, HotCloud.

[16]  Jignesh M. Patel,et al.  A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[17]  Samee Ullah Khan,et al.  A goal programming approach for the joint optimization of energy consumption and response time in computational grids , 2009, 2009 IEEE 28th International Performance Computing and Communications Conference.

[18]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[19]  Pramod Bhatotia,et al.  Incoop: MapReduce for incremental computations , 2011, SoCC.

[20]  Mirek Riedewald,et al.  Processing theta-joins using MapReduce , 2011, SIGMOD '11.

[21]  Anthony K. H. Tung,et al.  MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters , 2011, IEEE Transactions on Knowledge and Data Engineering.

[22]  Min Wang,et al.  Efficient Multi-way Theta-Join Processing Using MapReduce , 2012, Proc. VLDB Endow..

[23]  Fan Zhang,et al.  Dataset Scaling and MapReduce Performance , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[24]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[25]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[26]  Kun-Lung Wu,et al.  DEDUCE: at the intersection of MapReduce and stream processing , 2010, EDBT '10.

[27]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[28]  Jeffrey D. Ullman,et al.  Optimizing joins in a map-reduce environment , 2010, EDBT '10.

[29]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[30]  Achim Streit,et al.  MapReduce across Distributed Clusters for Data-intensive Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[31]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[32]  Chao Tian,et al.  Nova: continuous Pig/Hadoop workflows , 2011, SIGMOD '11.

[33]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[34]  Rajiv Ranjan,et al.  G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..

[35]  Albert Y. Zomaya,et al.  Data-Intensive Workload Consolidation for the Hadoop Distributed File System , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[36]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[37]  Yong Zhao,et al.  Many-task computing for grids and supercomputers , 2008, 2008 Workshop on Many-Task Computing on Grids and Supercomputers.

[38]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[39]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[40]  Qutaibah M. Malluhi,et al.  ConMR: Concurrent MapReduce Programming Model for Large Scale Shared-Data Applications , 2013, 2013 42nd International Conference on Parallel Processing.