Labeled Network Stack: A High-Concurrency and Low-Tail Latency Cloud Server Framework for Massive IoT Devices

Internet of Things (IoT) applications have massive client connections to cloud servers, and the number of networked IoT devices is remarkably increasing. IoT services require both low-tail latency and high concurrency in datacenters. This study aims to determine whether an order of magnitude improvement is possible in tail latency and concurrency in mainstream systems by proposing a hardware–software codesigned labeled network stack (LNS) for future datacenters. The key innovation is a cross-layered payload labeling mechanism that distinguishes different requests by payload across the full network stack, including application, TCP/IP, and Ethernet layers. This type of design enables prioritized data packet processing and forwarding along the full datapath, such that latency-insensitive requests cannot significantly interfere with high-priority requests. We build a prototype datacenter server to evaluate the LNS design against a commercial X86 server and the mTCP research, using a cloud-supported IoT application scenario. Experimental results show that the LNS design can provide an order of magnitude improvement in tail latency and concurrency. A single datacenter server node can support over 2 million concurrent long-living connections for IoT devices as a 99-percentile tail latency of 50 ms is maintained. In addition, the hardware–software codesign approach remarkably reduces the labeling and prioritization overhead and constrains the interference of high-priority requests to low-priority requests.

[1]  Anja Feldmann,et al.  C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.

[2]  Hari Balakrishnan,et al.  Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads , 2019, NSDI.

[3]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[4]  Thu D. Nguyen,et al.  Implementing network protocols at user level , 1993, TNET.

[5]  Christoforos E. Kozyrakis,et al.  Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency , 2019, NSDI.

[6]  Feng Duan,et al.  The Tail at Scale: How to Predict It? , 2016, HotCloud.

[7]  Ke Liu,et al.  HCMonitor: An Accurate Measurement System for High Concurrent Network Services , 2019, 2019 IEEE International Conference on Networking, Architecture and Storage (NAS).

[8]  Randy H. Katz,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, SIGCOMM '12.

[9]  Christoforos E. Kozyrakis,et al.  IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.

[10]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[11]  Antonio Pescapè,et al.  Integration of Cloud computing and Internet of Things: A survey , 2016, Future Gener. Comput. Syst..

[12]  Cui Yong,et al.  TailCutter: Wisely cutting tail latency in cloud CDN under cost constraints , 2016 .

[13]  Byung-Gon Chun,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 135 Megapipe: a New Programming Interface for Scalable Network I/o , 2022 .

[14]  Fulvio Risso,et al.  Supporting Fine-Grained Network Functions through Intel DPDK , 2014, 2014 Third European Workshop on Software Defined Networks.

[15]  Daniel Sánchez,et al.  Tailbench: a benchmark suite and evaluation methodology for latency-critical applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[16]  Mark Handley,et al.  Network stack specialization for performance , 2015, SIGCOMM 2015.

[17]  Brian D. Noble,et al.  Bobtail: Avoiding Long Tails in the Cloud , 2013, NSDI.

[18]  Christina Delimitrou,et al.  Amdahl's law for tail latency , 2018, Commun. ACM.

[19]  Sayantan Sur,et al.  Memcached Design on High Performance RDMA Capable Interconnects , 2011, 2011 International Conference on Parallel Processing.

[20]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[21]  Zhiwei Xu,et al.  Low-entropy cloud computing systems , 2017 .

[22]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[23]  Hakim Weatherspoon,et al.  NetSlices: Scalable multi-core packet processing in user-space , 2012, 2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[24]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[25]  Adlen Ksentini,et al.  DPDK Open vSwitch performance validation with mirroring feature , 2016, 2016 23rd International Conference on Telecommunications (ICT).

[26]  Xiao Feng,et al.  MCC: A Predictable and Scalable Massive Client Load Generator , 2019, Bench.

[27]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[28]  Mohsen Guizani,et al.  Deep Learning for IoT Big Data and Streaming Analytics: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[29]  Marimuthu Palaniswami,et al.  Internet of Things (IoT): A vision, architectural elements, and future directions , 2012, Future Gener. Comput. Syst..

[30]  Jialin Li,et al.  Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.

[31]  Huan Liu,et al.  A Measurement Study of Server Utilization in Public Clouds , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[32]  Hui Song,et al.  Labeled Network Stack: A Co-designed Stack for Low Tail-Latency and High Concurrency in Datacenter Services , 2018, NPC.

[33]  Syed Obaid Amin,et al.  Minion: Unordered Delivery Wire-Compatible with TCP and TLS , 2011 .

[34]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[35]  Yungang Bao,et al.  Labeled von Neumann Architecture for Software-Defined Cloud , 2017, Journal of Computer Science and Technology.

[36]  Yun Chen,et al.  Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD) , 2015, ASPLOS.