Using MinMax-Memory Claims to Improve In-Memory Workflow Computations in the Cloud

In this paper, we consider to improve scientific workflows in cloud environments where data transfers between tasks are performed via provisioned in-memory caching as a service, instead of relying entirely on slower disk-based file systems. However, this improvement is not free since services in the cloud are usually charged in a “pay-as-you-go” model. As a consequence, the workflow tenants have to estimate the amount of memory that they would like to pay. Given the intrinsic complexity of the workflows, it would be very hard to make an accurate prediction, which would lead to either oversubscription or undersubscription, resulting in unproductive spending or performance degradation. To address this problem, we propose a concept of minmax memory claim (MMC) to achieve cost-effective workflow computations in in-memory cloud computing environments. The minmax-memory claim is defined as the minimum amount of memory required to finish the workflow without compromising its maximum concurrency. With the concept of MMC, the workflow tenants can achieve the best performance via in-memory computing while minimizing the cost. In this paper, we present the procedure of how to find the MMCs for those workflows with arbitrary graphs in general and develop optimal efficient algorithms for some well-structured workflows in particular. To further show the values of this concept, we also implement these algorithms and apply them, through a simulation study, to improve deadlock resolutions in workflow-based workloads when memory resources are constrained.

[1]  M. A. Lawley,et al.  Efficient implementations of Banker's algorithm for deadlock avoidance in flexible manufacturing systems , 1997, 1997 IEEE 6th International Conference on Emerging Technologies and Factory Automation Proceedings, EFTA '97.

[2]  Sheau-Dong Lang An Extended Banker's Algorithm for Deadlock Avoidance , 1999, IEEE Trans. Software Eng..

[3]  Johan Montagnat,et al.  Grid-enabled workflows for data intensive medical applications , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[4]  Eric S. Chung,et al.  SpMV: A Memory-Bound Application on the GPU Stuck Between a Rock and a Hard Place , 2012 .

[5]  Scott Shenker,et al.  Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks , 2014, SoCC.

[6]  G. Bruce Berriman,et al.  Data Sharing Options for Scientific Workflows on Amazon EC2 , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Martin Schulz,et al.  Modeling the Impact of Reduced Memory Bandwidth on HPC Applications , 2014, Euro-Par.

[8]  Viet Ha Nguyen,et al.  Static Performance Evaluation for Memory-Bound Computing: The MBRAM Model , 2004, PDPTA.

[9]  Ted Wobber,et al.  Moderately hard, memory-bound functions , 2005, TOIT.

[10]  P. O. Hulth The AMANDA Collaboration The Amanda Experiment , 1996 .

[11]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[12]  Patric R. J. Östergård,et al.  A fast algorithm for the maximum clique problem , 2002, Discret. Appl. Math..

[13]  David R. Wood,et al.  An algorithm for finding a maximum clique in a graph , 1997, Oper. Res. Lett..

[14]  Yang Wang,et al.  Maximizing Active Storage Resources with Deadlock Avoidance in Workflow-Based Computations , 2013, IEEE Transactions on Computers.

[15]  Toshimi Minoura,et al.  Deadlock avoidance revisited , 1982, JACM.

[16]  Dan Feng,et al.  CDRM: A Cost-Effective Dynamic Replication Management Scheme for Cloud Storage Cluster , 2010, 2010 IEEE International Conference on Cluster Computing.

[17]  Yang Wang,et al.  Boosting Parallel File System Performance via Heterogeneity-Aware Selective Data Layout , 2016, IEEE Transactions on Parallel and Distributed Systems.

[18]  Franz-Josef Pfreundt,et al.  MapReduce in GPI-Space , 2013, Euro-Par Workshops.

[19]  Marta Mattoso,et al.  Exploring Molecular Evolution Reconstruction Using a Parallel Cloud Based Scientific Workflow , 2012, BSB.

[20]  Zhiyong Lu,et al.  Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations , 2004, Nucleic Acids Res..

[21]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[22]  Mei-Hui Su,et al.  Characterization of scientific workflows , 2008, 2008 Third Workshop on Workflows in Support of Large-Scale Science.

[23]  Yang Wang,et al.  DDS: A deadlock detection-based scheduling algorithm for workflow computations in HPC systems with storage constraints , 2013, Parallel Comput..

[24]  Ling Liu,et al.  Cost-Effective Resource Provisioning for MapReduce in a Cloud , 2015, IEEE Transactions on Parallel and Distributed Systems.

[25]  Alain Hertz,et al.  A sequential elimination algorithm for computing bounds on the clique number of a graph , 2008, Discret. Optim..

[26]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[27]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[28]  Yang Wang,et al.  Improving Performance of Parallel I/O Systems through Selective and Layout-Aware SSD Cache , 2016, IEEE Transactions on Parallel and Distributed Systems.

[29]  Arnold L. Rosenberg,et al.  On scheduling mesh-structured computations for Internet-based computing , 2004, IEEE Transactions on Computers.

[30]  Raheem A. Beyah,et al.  Using network traffic to passively detect under utilized resources in high performance cluster grid computing environments , 2007, GridNets '07.

[31]  P. O. Hulth The Amanda Experiment , 1996 .

[32]  Gregory Chockler,et al.  Data caching as a cloud service , 2010, LADIS '10.

[33]  Gagan Agrawal,et al.  Elastic Cloud Caches for Accelerating Service-Oriented Computations , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[34]  Chase Qishi Wu,et al.  A cost-effective scheduling algorithm for scientific workflows in clouds , 2012, 2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC).

[35]  Ymir Vigfusson,et al.  Design and implementation of caching services in the cloud , 2011, IBM J. Res. Dev..

[36]  D.A. Reed,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, Proceedings of the IEEE/ACM SC95 Conference.