Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines

The proliferation of camera-enabled devices and large video repositories has led to a diverse set of video analytics applications. These applications rely on video pipelines, represented as DAGs of operations, to transform videos, process extracted metadata, and answer questions like, "Is this intersection congested?" The latency and resource efficiency of pipelines can be optimized using configurable knobs for each operation (e.g., sampling rate, batch size, or type of hardware used). However, determining efficient configurations is challenging because (a) the configuration search space is exponentially large, and (b) the optimal configuration depends on users' desired latency and cost targets, (c) input video contents may exercise different paths in the DAG and produce a variable amount intermediate results. Existing video analytics and processing systems leave it to the users to manually configure operations and select hardware resources. We present Llama: a heterogeneous and serverless framework for auto-tuning video pipelines. Given an end-to-end latency target, Llama optimizes for cost efficiency by (a) calculating a latency target for each operation invocation, and (b) dynamically running a cost-based optimizer to assign configurations across heterogeneous hardware that best meet the calculated per-invocation latency target. This makes the problem of auto-tuning large video pipelines tractable and allows us to handle input-dependent behavior, conditional branches in the DAG, and execution variability. We describe the algorithms in Llama and evaluate it on a cloud platform using serverless CPU and GPU resources. We show that compared to state-of-the-art cluster and serverless video analytics and processing systems, Llama achieves 7.8x lower latency and 16x cost reduction on average.

[1]  Rakesh Kumar,et al.  VideoChef: Efficient Approximation for Streaming Video Processing Pipelines , 2018, USENIX Annual Technical Conference.

[2]  Pat Hanrahan,et al.  Scanner: Efficient Video Analysis at Scale , 2018, ACM Trans. Graph..

[3]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[4]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[5]  Lingjia Tang,et al.  GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks , 2019, EuroSys.

[6]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[7]  Gul Agha,et al.  Costless: Optimizing Cost of Serverless Computing through Function Fusion and Placement , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).

[8]  Joel H. Saltz,et al.  Principles of runtime support for parallel processors , 1988, ICS '88.

[9]  Amir Shaikhha,et al.  DBToaster: higher-order delta processing for dynamic, frequently fresh views , 2012, The VLDB Journal.

[10]  Harsha V. Madhyastha,et al.  Sol: Fast Distributed Computation Over Slow Networks , 2020, NSDI.

[11]  Guoliang Li,et al.  An End-to-End Learning-based Cost Estimator , 2019, Proc. VLDB Endow..

[12]  Steven Hand,et al.  Musketeer: all for one, one for all in data processing systems , 2015, EuroSys.

[13]  Scott Shenker,et al.  Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .

[14]  Matei Zaharia,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[15]  Mor Harchol-Balter,et al.  TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters , 2016, EuroSys.

[16]  Ion Stoica,et al.  Chameleon: scalable adaptation of video analytics , 2018, SIGCOMM.

[17]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[18]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[19]  Aditya Akella,et al.  Dynamic Query Re-Planning using QOOP , 2018, OSDI.

[20]  Haichen Shen,et al.  Nexus: a GPU cluster engine for accelerating DNN-based video analysis , 2019, SOSP.

[21]  Alvin Cheung,et al.  TASM: A Tile-Based Storage Manager for Video Analytics , 2020, ArXiv.

[22]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[23]  Randy H. Katz,et al.  Wrangler: Predictable and Faster Jobs using Fewer Resources , 2014, SoCC.

[24]  Tao Yu,et al.  Efficient algorithms for Web services selection with end-to-end QoS constraints , 2007, TWEB.

[25]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.

[26]  Peter Bailis,et al.  BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics , 2018, Proc. VLDB Endow..

[27]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[28]  Ion Stoica,et al.  Occupy the cloud: distributed computing for the 99% , 2017, SoCC.

[29]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[30]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[31]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[32]  Feifei Li,et al.  iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases , 2019, Proc. VLDB Endow..

[33]  Jean-Philippe Martin,et al.  Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.

[34]  Randy H. Katz,et al.  Selecting the best VM across multiple public clouds: a data-driven performance modeling approach , 2017, SoCC.

[35]  Ion Stoica,et al.  The Power of Choice in Data-Aware Cluster Scheduling , 2014, OSDI.

[36]  Steven Hand,et al.  CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[37]  Anirudh Sivaraman,et al.  Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads , 2017, NSDI.

[38]  Christoforos E. Kozyrakis,et al.  From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers , 2019, USENIX Annual Technical Conference.

[39]  Geoffrey M. Voelker,et al.  Sprocket: A Serverless Video Processing Framework , 2018, SoCC.

[40]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[41]  Willy Zwaenepoel,et al.  Rock you like a hurricane: taming skew in large scale analytics , 2018, EuroSys.

[42]  Paramvir Bahl,et al.  Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.

[43]  Christoforos E. Kozyrakis,et al.  Selecta: Heterogeneous Cloud Storage Configuration for Data Analytics , 2018, USENIX Annual Technical Conference.

[44]  Haipeng Shen,et al.  Artificial intelligence in healthcare: past, present and future , 2017, Stroke and Vascular Neurology.

[45]  Joseph Gonzalez,et al.  InferLine: latency-aware provisioning and scaling for prediction serving pipelines , 2020, SoCC.

[46]  Yaqi Zhang,et al.  Gorgon: Accelerating Machine Learning from Relational Data , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[47]  Srikanth Kandula,et al.  Jockey: guaranteed job latency in data parallel clusters , 2012, EuroSys '12.

[48]  Chen Li,et al.  Tempura , 2020, Proc. VLDB Endow..

[49]  Ning Cheng,et al.  Agilex™ Generation of Intel® FPGAs , 2020, 2020 IEEE Hot Chips 32 Symposium (HCS).

[50]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[51]  Dan Delorey,et al.  Dremel: A Decade of Interactive SQL Analysis at Web Scale , 2020, Proc. VLDB Endow..

[52]  Paramvir Bahl,et al.  Real-Time Video Analytics: The Killer App for Edge Computing , 2017, Computer.

[53]  Panayiotis G. Georgiou,et al.  Multi-Label Multi-Task Deep Learning for Behavioral Coding , 2018, IEEE Transactions on Affective Computing.

[54]  Saurabh Bagchi,et al.  OPTIMUSCLOUD: Heterogeneous Configuration Optimization for Distributed Databases in the Cloud , 2020, USENIX Annual Technical Conference.

[55]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[56]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[57]  Seung-won Hwang,et al.  List Intersection for Web Search: Algorithms, Cost Models, and Optimizations , 2018, Proc. VLDB Endow..

[58]  Chita R. Das,et al.  Fifer: Tackling Resource Underutilization in the Serverless Era , 2020, Middleware.

[59]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[60]  Rudolf Eigenmann,et al.  Optimizing irregular shared-memory applications for distributed-memory systems , 2006, PPoPP '06.

[61]  Sahil Malik Azure Functions , 2019 .