With the introduction of low power System on a Chip (SoC) processor architectures in enterprise server configurations, there is a growing need to develop the software that will support scale-out, data intensive cloud applications that are deployed in data centers today. In this paper, we describe the design and implementation of a low latency user space fully compliant TCP/IP socket stack on a low power System on a Chip (SoC) architecture and demonstrate that this library can become the basis for “Big Data” applications that require both high throughput and low latency capabilities all on a power optimized system platform. For our work, we are specifically targeting cloud applications that are developed on runtimes which are seeing great growth in programmer communities and enterprise deployment as well as for which the I/O bottlenecks outweigh the compute requirements, e.g. memcached. On low-power embedded-class SoC servers, these I/O bottlenecks can be prohibitively expensive for performance and scaling requirements of such applications, even when the CPU efficiency and memory bandwidth are adequate. Our approach removes this bottleneck by leveraging available SoC integrated Network Interface Cards (NICs) as well as user space communication - thereby improving pathlength to data as well as preserving CPU cycles from context switching. Our experiments show that we can achieve sub 5 μsec ping-pong latency for 8B packets, and also provide substantive improvement to the memslap benchmark not just when compared to memcached running on the T4240 with the kernel stack (3.5 times better for 16B SETs) but also when compared to a standard x86_64 server with ConnectX 10GbE adapters when power based metrics are used (close to a factor of 2 improvement with power normalized metrics).
[1]
David G. Andersen,et al.
Energy-efficient cluster computing with FAWN: workloads and implications
,
2010,
e-Energy.
[2]
Yanpei Chen,et al.
Towards Energy Efficient MapReduce
,
2009
.
[3]
Steven Swanson,et al.
Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications
,
2009,
ASPLOS.
[4]
Sayantan Sur,et al.
Memcached Design on High Performance RDMA Capable Interconnects
,
2011,
2011 International Conference on Parallel Processing.
[5]
Thomas F. Wenisch,et al.
Thin servers with smart pipes: designing SoC accelerators for memcached
,
2013,
ISCA.
[6]
Mendel Rosenblum,et al.
It's Time for Low Latency
,
2011,
HotOS.
[7]
Alexander S. Szalay,et al.
Low-power amdahl-balanced blades for data intensive computing
,
2010,
OPSR.
[8]
Hemant Agrawal,et al.
Device Drivers in User Space: A Case for Network Device Driver
,
2012
.