Development and Operation of Elastic Parallel Tree Search Applications Using TASKWORK

Cloud resources can be dynamically provisioned according to application-specific requirements and are payed on a per-use basis. This gives rise to a new concept for parallel processing: Elastic parallel computations. However, it is still an open research question to which extent parallel applications can benefit from elastic scaling, which requires resource adaptation at runtime and corresponding coordination mechanisms. In this work, we analyze how to address these system-level challenges in the context of developing and operating elastic parallel tree search applications. Based on our findings, we discuss the design and implementation of TASKWORK, a cloud-aware runtime system specifically designed for elastic parallel tree search, which enables the implementation of elastic applications by means of higher-level development frameworks. We show how to implement an elastic parallel branch-and-bound application based on an exemplary development framework and report on our experimental evaluation that also considers several benchmarks for parallel tree search.

[1]  R. Prim Shortest connection networks and some generalizations , 1957 .

[2]  Cristiano André da Costa,et al.  AutoElastic: Automatic Resource Elasticity for High Performance Applications in the Cloud , 2016, IEEE Transactions on Cloud Computing.

[3]  Marius Hillenbrand,et al.  High performance cloud computing , 2013, Future Gener. Comput. Syst..

[4]  Douglas Thain,et al.  Designing Self-Tuning Split-Map-Merge Applications for High Cost-Efficiency in the Cloud , 2017, IEEE Transactions on Cloud Computing.

[5]  Dejan S. Milojicic,et al.  Evaluating and Improving the Performance and Scheduling of HPC Applications in Cloud , 2016, IEEE Transactions on Cloud Computing.

[6]  Wolfgang Küchlin,et al.  The Distributed Object-Oriented Threads System DOTS , 1998, IRREGULAR.

[7]  Wolfgang Blochinger,et al.  Cost-efficient parallel processing of irregularly structured problems in cloud computing environments , 2018, Cluster Computing.

[8]  Wolfgang Blochinger,et al.  AUTOGENIC: Automated Generation of Self-configuring Microservices , 2018, CLOSER.

[9]  Wolfgang Blochinger,et al.  COHESION - A microkernel based Desktop Grid platform for irregular task-parallel applications , 2008, Future Gener. Comput. Syst..

[10]  Jeff T. Linderoth,et al.  Solving large quadratic assignment problems on computational grids , 2002, Math. Program..

[11]  Wolfgang Blochinger,et al.  Model-Based Generation of Self-adaptive Cloud Services , 2018, CLOSER.

[12]  Wolfgang Blochinger,et al.  TASKWORK: A Cloud-aware Runtime System for Elastic Task-parallel HPC Applications , 2019, CLOSER.

[13]  James Demmel,et al.  the Parallel Computing Landscape , 2022 .

[14]  Wolfgang Blochinger,et al.  A Desktop Grid enabled parallel Barnes-Hut algorithm , 2012, 2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC).

[15]  Wolfgang Blochinger,et al.  Elastic Parallel Systems for High Performance Cloud Computing: State-of-the-Art and Future Directions , 2019, Parallel Process. Lett..

[16]  Bruno Schulze,et al.  An Analysis of Public Clouds Elasticity in the Execution of Scientific Applications: a Survey , 2016, Journal of Grid Computing.

[17]  Wolfgang Blochinger,et al.  Aspect-Oriented Parallel Discrete Optimization on the Cohesion Desktop Grid Platform , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[18]  Stephen L. Olivier,et al.  UTS: An Unbalanced Tree Search Benchmark , 2006, LCPC.

[19]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[20]  Dejan S. Milojicic,et al.  Improving HPC Application Performance in Cloud through Dynamic Load Balancing , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[21]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[22]  Wolfgang Küchlin,et al.  An object-oriented platform for distributed high-performance symbolic computation , 1999 .

[23]  Patrick Maier,et al.  Replicable parallel branch and bound search , 2017, J. Parallel Distributed Comput..

[24]  Wolfgang Blochinger,et al.  Migrating parallel applications to the cloud: assessing cloud readiness based on parallel design decisions , 2019, SICS Software-Intensive Cyber-Physical Systems.

[25]  Herbert Kuchen,et al.  Algorithmic Skeletons for Branch and Bound , 2006, ICSOFT.

[26]  Wolfgang Blochinger,et al.  TOSCA-based container orchestration on Mesos , 2017, Computer Science - Research and Development.

[27]  Rajkumar Buyya,et al.  HPC Cloud for Scientific and Business Applications , 2017, ACM Comput. Surv..

[28]  Marco Bungart,et al.  A Malleable and Fault-Tolerant Task Pool Framework for X10 , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[29]  Wolfgang Blochinger,et al.  Parallel SAT Solving on Peer-to-Peer Desktop Grids , 2010, 2010 International Conference on High Performance Computing & Simulation.

[30]  Wolfgang Blochinger,et al.  Container-Based Module Isolation for Cloud Services , 2019, 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE).

[31]  Flavio Junqueira,et al.  ZooKeeper: Distributed Process Coordination , 2013 .

[32]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[33]  Dejan S. Milojicic,et al.  The Who, What, Why, and How of High Performance Computing in the Cloud , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[34]  Claudia Fohry,et al.  Hybrid work stealing of locality-flexible and cancelable tasks for the APGAS library , 2018, The Journal of Supercomputing.

[35]  Bernard Gendron,et al.  Parallel Branch-and-Branch Algorithms: Survey and Synthesis , 1994, Oper. Res..

[36]  Douglas Thain,et al.  Converting a High Performance Application to an Elastic Cloud Application , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.