论文信息 - Performance Characterization and Modeling of Serverless and HPC Streaming Applications

Performance Characterization and Modeling of Serverless and HPC Streaming Applications

Industrial and scientific streaming applications require support for different types of processing and the management of heterogeneous infrastructure over a dynamic range of scales: from the edge to the cloud and HPC, and intermediate resources. Serverless is an emerging service that combines high-level middleware services, such as distributed execution engines for managing tasks, with low-level infrastructure. It offers the potential of usability and scalability but adds to the complexity of managing heterogeneous and dynamic resources. In response, we extend Pilot-Streaming to support serverless platforms. Pilot-Streaming provides a unified abstraction for resource management for HPC, cloud, and serverless, and allocates resource containers independent of the application workload removing the need to write resource-specific code. Understanding the performance and scaling characteristics of streaming applications and infrastructure presents another challenge. StreamInsight provides insight into the performance of streaming applications and infrastructure, their selection, configuration, and scaling behavior. Underlying StreamInsight is the universal scalability law, which permits the accurate quantification of scalability properties of streaming applications. Using experiments on HPC and AWS Lambda, we demonstrate that StreamInsight provides an accurate model for a variety of application characteristics, e. g., machine learning model sizes and resource configurations.

Shantenu Jha | Andre Luckow

[1] Francine Berman,et al. Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[2] Shantenu Jha,et al. Pilot-Data: An abstraction for distributed data , 2013, J. Parallel Distributed Comput..

[3] Jeyhun Karimov,et al. Benchmarking Distributed Stream Processing Engines , 2018, ICDE.

[4] Geoffrey C. Fox,et al. Learning Everywhere: Pervasive Machine Learning for Effective High-Performance Computation , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[5] Ion Stoica,et al. Occupy the cloud: distributed computing for the 99% , 2017, SoCC.

[6] Ramakrishnan Kannan,et al. Mini-apps for high performance data analysis , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[7] Geoffrey C. Fox,et al. Towards a Comprehensive Set of Big Data Benchmarks , 2014, High Performance Computing Workshop.

[8] Perry Cheng,et al. The serverless trilemma: function composition for serverless computing , 2017, Onward!.

[9] Neil J. Gunther,et al. Analyze System Scalability with the Universal Scalability Law , 2014 .

[10] Perry Cheng,et al. Serverless Computing: Current Trends and Open Problems , 2017, Research Advances in Cloud Computing.

[11] Shantenu Jha,et al. P∗: A model of pilot-abstractions , 2012, 2012 IEEE 8th International Conference on E-Science.

[12] Shantenu Jha,et al. Pilot-MapReduce: an extensible and flexible MapReduce implementation for distributed data , 2012, MapReduce '12.

[13] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[14] David A. Patterson,et al. Cloud Programming Simplified: A Berkeley View on Serverless Computing , 2019, ArXiv.

[15] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[16] Jay Kreps,et al. Kafka : a Distributed Messaging System for Log Processing , 2011 .

[17] Geoffrey C. Fox,et al. Status of Serverless Computing and Function-as-a-Service(FaaS) in Industry and Research , 2017, ArXiv.

[18] Yaoliang Yu,et al. Petuum: A New Platform for Distributed Machine Learning on Big Data , 2015, IEEE Trans. Big Data.

[19] Neil J. Gunther. A Simple Capacity Model of Massively Parallel Transaction Systems , 1993, Int. CMG Conference.

[20] Micah Beck,et al. Harnessing the Computing Continuum for Programming Our World , 2020, Fog Computing.

[21] Ning Wang,et al. Caladrius: A Performance Modelling Service for Distributed Stream Processing Systems , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[22] Douglas Thain,et al. A Lightweight Model for Right-Sizing Master-Worker Applications , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[23] Shrideep Pallickara,et al. Online Scheduling and Interference Alleviation for Low-Latency, High-Throughput Processing of Data Streams , 2017, IEEE Transactions on Parallel and Distributed Systems.

[24] Minlan Yu,et al. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[25] Shantenu Jha,et al. SAGA BigJob: An Extensible and Interoperable Pilot-Job Abstraction for Distributed Applications and Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[26] Zhuo Liu,et al. Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[27] Neil J. Gunther,et al. Unification of Amdahl's Law, LogP and Other Performance Models for Message-Passing Architectures , 2005, IASTED PDCS.

[28] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[29] Shantenu Jha,et al. Pilot-Streaming: A Stream Processing Framework for High-Performance Computing , 2018, 2018 IEEE 14th International Conference on e-Science (e-Science).

[30] Ion Stoica,et al. Numpywren: Serverless Linear Algebra , 2018, ArXiv.

[31] Michael Stonebraker,et al. Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[32] Mashrur Chowdhury,et al. A Distributed Message Delivery Infrastructure for Connected Vehicle Technology Applications , 2018, IEEE Transactions on Intelligent Transportation Systems.

[33] Ion Stoica,et al. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.