FLAIR: Accelerating Reads with Consistency-Aware Network Routing

We present FLAIR, a novel approach for accelerating read operations in leader-based consensus protocols. FLAIR leverages the capabilities of the new generation of programmable switches to serve reads from follower replicas without compromising consistency. The core of the new approach is a packet-processing pipeline that can track client requests and system replies, identify consistent replicas, and at line speed, forward read requests to replicas that can serve the read without sacrificing linearizability. An additional benefit of FLAIR is that it facilitates devising novel consistency-aware load balancing techniques. Following the new approach, we designed FlairKV, a key-value store atop Raft. FlairKV implements the processing pipeline using the P4 programming language. We evaluate the benefits of the proposed approach and compare it to previous approaches using a cluster with a Barefoot Tofino switch. Our evaluation indicates that, compared to state-of-the-art alternatives, the proposed approach can bring significant performance gains: up to 42% higher throughput and 35-97% lower latency for most workloads.

[1]  Fernando Pedone,et al.  NetPaxos: consensus at network speed , 2015, SOSR.

[2]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[3]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[4]  Andrea C. Arpaci-Dusseau,et al.  NICE: Network-Integrated Cluster-Efficient Storage , 2017, HPDC.

[5]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[6]  Marcin Paprzycki,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 2001, Scalable Comput. Pract. Exp..

[7]  Flavio Paiva Junqueira,et al.  Zab: High-performance broadcast for primary-backup systems , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[8]  Torsten Hoefler,et al.  DARE: High-Performance State Machine Replication on RDMA Networks , 2015, HPDC.

[9]  Michael J. Freedman,et al.  Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads , 2009, USENIX Annual Technical Conference.

[10]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[11]  Xiaozhou Li,et al.  NetChain: Scale-Free Sub-RTT Coordination , 2018, NSDI.

[12]  Nate Foster,et al.  NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[13]  Xiaozhou Li,et al.  Be Fast, Cheap and in Control with SwitchKV , 2016, NSDI.

[14]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[15]  Jialin Li,et al.  Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering , 2016, OSDI.

[16]  Richard Wang,et al.  OpenFlow-Based Server Load Balancing Gone Wild , 2011, Hot-ICE.

[17]  Ian Rae,et al.  F1: A Distributed SQL Database That Scales , 2013, Proc. VLDB Endow..

[18]  Jialin Li,et al.  Designing Distributed Systems Using Approximate Synchrony in Data Center Networks , 2015, NSDI.

[19]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[20]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[21]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[22]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[23]  Marcos K. Aguilera,et al.  Consistency-based service level agreements for cloud storage , 2013, SOSP.

[24]  John K. Ousterhout,et al.  Exploiting Commutativity For Practical Fast Replication , 2017, NSDI.

[25]  Tal Garfinkel,et al.  XvMotion: Unified Virtual Machine Migration over Long Distance , 2014, USENIX Annual Technical Conference.

[26]  Russell J. Clark,et al.  Resonance: dynamic access control for enterprise networks , 2009, WREN '09.

[27]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[28]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[29]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[30]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[31]  Byrav Ramamurthy,et al.  Network Innovation using OpenFlow: A Survey , 2014, IEEE Communications Surveys & Tutorials.

[32]  David G. Andersen,et al.  Paxos Quorum Leases: Fast Reads Without Sacrificing Writes , 2014, SoCC.

[33]  Accelerating Reads With In-Network Consistency-Aware Load Balancing , 2021, IEEE/ACM Transactions on Networking.

[34]  Keith Marzullo,et al.  Mencius: Building Efficient Replicated State Machine for WANs , 2008, OSDI.

[35]  David G. Andersen,et al.  There is more consensus in Egalitarian parliaments , 2013, SOSP.

[36]  David Mazières Paxos Made Practical , 2007 .

[37]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[38]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[39]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[40]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[41]  Barbara Liskov,et al.  Viewstamped Replication Revisited , 2012 .

[42]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[43]  Jamie K. Wolff,et al.  P4 , 1987 .