Quartz: time-as-a-service for coordination in geo-distributed systems

Geo-distributed systems ranging from databases to cyber-physical applications increasingly rely on a shared and precise notion of time to achieve coordination. This is especially true for cyber-physical applications ranging from local-scale robotic-coordination and city-scale traffic management to regional/planetary-scale smart grids. Each of these applications utilizes event orderings and timing offsets to make real-time decisions, so as to perform coordinated action at their distributed endpoints. The emergence of edge computing, specifically to facilitate low-latency decision-making, is leveraging the trend where multiple cyber-physical and software applications with different timing requirements will coexist in both the cloud and at the edge. To enable such fault-tolerant time-based coordinated applications running on multi-tenant geo-scale infrastructure, we introduce the Quartz framework, which exposes Time-as-a-Service. Quartz allows geo-distributed application components to each specify its timing requirements, while it autonomously orchestrates the underlying infrastructure to meet them. Centered around a shared virtualized notion of time, based on the time-line abstraction [1], Quartz provides an API which makes it easy to develop time-based geo-distributed applications. Using this API, Quartz feeds back the timing uncertainty, i.e., the delivered Quality of Time (QoT) [1] back to each application, enabling it to be fault-tolerant in the face of clock-synchronization failure. Quartz is designed for containerized applications, features a distributed architecture and is implemented using containerized micro-services. Experimental evaluations on real-world embedded, edge and cloud platforms highlight the performance and scalability of our architecture.

[1]  John Enright,et al.  Optimization and Coordinated Autonomy in Mobile Fulfillment Systems , 2011, Automated Action Planning for Autonomous Mobile Robots.

[2]  Paramvir Bahl,et al.  The Case for VM-Based Cloudlets in Mobile Computing , 2009, IEEE Pervasive Computing.

[3]  Tal Mizrahi,et al.  Serving time in the cloud: Why time-as-a-service? , 2016, 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[4]  Ragunathan Rajkumar,et al.  QuartzV: Bringing Quality of Time to Virtual Machines , 2018, 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[5]  Barbara Liskov,et al.  Practical uses of synchronized clocks in distributed systems , 1991, PODC '91.

[7]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[8]  Raja Lavanya,et al.  Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.

[9]  Xiao Zhang,et al.  Microgrid Losses: When the Whole Is Greater Than the Sum of Its Parts , 2016, 2016 ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS).

[10]  Ragunathan Rajkumar,et al.  Time-based Coordination in Geo-Distributed Cyber-Physical Systems , 2017, HotCloud.

[11]  Andrea Bondavalli,et al.  Safe estimation of time uncertainty of local clocks , 2009, 2009 International Symposium on Precision Clock Synchronization for Measurement, Control and Communication.

[12]  Hermann Kopetz,et al.  The time-triggered architecture , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[13]  Janos J. Gertler,et al.  Analytical Redundancy Methods in Fault Detection and Isolation , 1991 .

[14]  Aura Ganz,et al.  SURGNET: An Integrated Surgical Data Transmission System for Telesurgery , 2009, International journal of telemedicine and applications.

[15]  David W. Allan Clock Characterization Tutorial , 1984 .

[16]  James M. Rehg,et al.  Stampede: A Cluster Programming Middleware for Interactive Stream-Oriented Applications , 2003, IEEE Trans. Parallel Distributed Syst..

[17]  Kang Lee,et al.  IEEE 1588 standard for a precision clock synchronization protocol for networked measurement and control systems , 2002, 2nd ISA/IEEE Sensors for Industry Conference,.

[18]  Insup Lee,et al.  Cyber-physical systems: The next computing revolution , 2010, Design Automation Conference.

[19]  David L. Mills,et al.  Internet time synchronization: the network time protocol , 1991, IEEE Trans. Commun..

[20]  Julien Ridoux,et al.  Virtualize Everything but Time , 2010, OSDI.

[21]  Amin Vahdat,et al.  Exploiting a Natural Network Effect for Scalable, Fine-grained Clock Synchronization , 2018, NSDI.

[22]  Edward A. Lee,et al.  Execution Strategies for PTIDES, a Programming Model for Distributed Embedded Systems , 2009, 2009 15th IEEE Real-Time and Embedded Technology and Applications Symposium.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Hakim Weatherspoon,et al.  Globally Synchronized Time via Datacenter Networks , 2016, SIGCOMM.

[25]  Umakishore Ramachandran,et al.  Stampede RT: Programming Abstractions for Live Streaming Applications , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[26]  Umakishore Ramachandran,et al.  Persistent Temporal Streams , 2009, Middleware.

[27]  Anthony Rowe,et al.  Timeline: An Operating System Abstraction for Time-Aware Applications , 2016, 2016 IEEE Real-Time Systems Symposium (RTSS).

[28]  Kang B. Lee,et al.  Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems , 2004 .

[29]  Béla Lantos,et al.  Formation control of a large group of UAVs with safe path planning and obstacle avoidance , 2013, 2014 European Control Conference (ECC).

[30]  Mahadev Satyanarayanan,et al.  Scalable crowd-sourcing of video from mobile devices , 2013, MobiSys '13.

[31]  Paramvir Bahl,et al.  VideoEdge: Processing Camera Streams using Hierarchical Clusters , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).