MicroEdge: a multi-tenant edge cluster system architecture for scalable camera processing

With the proliferation of high bandwidth cameras and AR/VR devices, and their increasing use in situation awareness applications, edge computing is gaining prominence to meet the throughput requirements of such applications. This work focuses on camera applications that perform real-time Machine Learning inferences on camera frames. We find that Machine Learning based camera applications suffer from hardware resource fragmentation due to models under-utilizing or over-utilizing the accelerator. Meanwhile, it is challenging to support fine-grained resource sharing for accelerators such as TPUs because they can only process requests sequentially in a run to completion fashion. We present MicroEdge, a multi-tenant low-cost edge cluster for camera processing applications running at the edge. MicroEdge provides multi-tenancy support for Coral TPUs by extending K3s, an edge-specific distribution of Kubernetes. Through an admission control algorithm, it allows for fractional assignment of TPU resources commensurate with the application pipeline requirements to ensure that the TPUs are fully utilized. Using real-time camera processing applications and a real-world trace, we show that MicroEdge can support up to 2.8x camera streams for a given hardware configuration compared to vanilla K3s, while maintaining scalability and performance requirements.

[1]  D. Z. Tootaghaj,et al.  SLA-Driven ML Inference Framework for Clouds with Hetergeneous Accelerators , 2022, MLSys.

[2]  Si Young Jang,et al.  Microservice-based Edge Device Architecture for Video Analytics , 2021, 2021 IEEE/ACM Symposium on Edge Computing (SEC).

[3]  Klara Nahrstedt,et al.  DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge , 2021, 2021 IEEE/ACM Symposium on Edge Computing (SEC).

[4]  Christoforos E. Kozyrakis,et al.  INFaaS: Automated Model-less Inference Serving , 2021, USENIX Annual Technical Conference.

[5]  U. Ramachandran,et al.  Coral-Pie: A Geo-Distributed Edge-compute Solution for Space-Time Vehicle Tracking , 2020, Middleware.

[6]  Youngki Lee,et al.  Heimdall: mobile GPU coordination platform for augmented reality applications , 2020, MobiCom.

[7]  Joseph E. Gonzalez,et al.  Spatula: Efficient cross-camera video analytics on large camera networks , 2020, 2020 IEEE/ACM Symposium on Edge Computing (SEC).

[8]  Ymir Vigfusson,et al.  Serving DNNs like Clockwork: Performance Predictability from the Bottom Up , 2020, OSDI.

[9]  Ricardo Bianchini,et al.  Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider , 2020, USENIX Annual Technical Conference.

[10]  Amazon SageMaker , 2019, Machine Learning in the AWS Cloud.

[11]  Pascal Perez,et al.  Edge-Computing Video Analytics for Real-Time Traffic Monitoring in a Smart City , 2019, Sensors.

[12]  Francesc Moreno-Noguer,et al.  3DPeople: Modeling the Geometry of Dressed Humans , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Jonathan Tompson,et al.  PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model , 2018, ECCV.

[14]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Christopher Olston,et al.  TensorFlow-Serving: Flexible, High-Performance ML Serving , 2017, ArXiv.

[16]  Matei Zaharia,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[17]  Tao Xie,et al.  SafeDrive: Online Driving Anomaly Detection From Large-Scale Vehicle Data , 2017, IEEE Transactions on Industrial Informatics.

[18]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[19]  Mahadev Satyanarayanan,et al.  The Emergence of Edge Computing , 2017, Computer.

[20]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM '89.