Light-weight remote communication for high-performance cloud networks

In this paper, we present early experiences with libRIPC, a light-weight communication library for high-performance cloud networks. Coming cloud networks are expected to be tightly interconnected and to show capabilities formerly reserved to high-performance computing. LibRIPC aims to bring the benefits of such architectures to heterogeneous cloud workloads. LibRIPC was designed for low footprint and easy integration; it supports reconfiguration and mutually untrusted communication partners. LibRIPC offers short and long transmit primitives, which are optimized for control messages and bulk data transfer respectively. Early experiments with a Java-based web server indicate that libRIPC integrates well into typical cloud workloads and brings substantial speedup of at least a factor of three for larger data transfers compared to socket-based TCP/IP communication.

[1]  Jan Stoess,et al.  A light-weight virtual machine monitor for Blue Gene/P , 2011, ROSS '11.

[2]  James Pinkerton,et al.  Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security , 2007, RFC.

[3]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[4]  Paul V. Mockapetris,et al.  Domain names: Concepts and facilities , 1983, RFC.

[5]  Marius Hillenbrand,et al.  High performance cloud computing , 2013, Future Gener. Comput. Syst..

[6]  Dror Goldenberg,et al.  Zero copy sockets direct protocol over infiniband-preliminary implementation and performance analysis , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[7]  Corporate The MPI Forum,et al.  MPI: a message passing interface , 1993, Supercomputing '93.

[8]  Byung-Gon Chun,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 135 Megapipe: a New Programming Interface for Scalable Network I/o , 2022 .

[9]  Frank Bellosa,et al.  Virtual InfiniBand clusters for HPC clouds , 2012, CloudCP '12.

[10]  Ewing Lusk,et al.  Fault Tolerance in MPI Programs , 2009 .

[11]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.

[12]  George Bosilca,et al.  The Common Communication Interface (CCI) , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.

[13]  David Hilley,et al.  Cloud Computing: A Taxonomy of Platform and Infrastructure-level Offerings , 2009 .

[14]  Torsten Hoefler,et al.  Analysis of the Memory Registration Process in the Mellanox InfiniBand Software Stack , 2006, Euro-Par.

[15]  Scott Rose,et al.  Protocol Modifications for the DNS Security Extensions , 2005, RFC.

[16]  Robbert van Renesse,et al.  Experiences with the Amoeba distributed operating system , 1990, CACM.

[17]  Arkady Kanevsky,et al.  Remote Direct Memory Access over the Converged Enhanced Ethernet Fabric: Evaluating the Options , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.

[18]  John Byrne,et al.  Power-efficient networking for balanced system designs: early experiences with PCIe , 2011, HotPower '11.

[19]  Jochen Liedtke,et al.  Improving IPC by kernel design , 1994, SOSP '93.

[20]  Dilma Da Silva,et al.  Providing a cloud network infrastructure on a supercomputer , 2010, HPDC '10.

[21]  Dhabaleswar K. Panda,et al.  Designing NFS with RDMA for Security, Performance and Scalability , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[22]  Pete Wyckoff,et al.  Memory Management Strategies for Data Serving with RDMA , 2007, 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2007).

[23]  Matthew Dempsky DNSCurve: Link-Level Security for the Domain Name System , 2010 .

[24]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[25]  Weikuan Yu,et al.  Hadoop acceleration through network levitated merge , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[26]  Dilma Da Silva,et al.  Experience with K42, an open-source, Linux-compatible, scalable operating-system kernel , 2005, IBM Syst. J..

[27]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[28]  Alekh Jindal,et al.  Hadoop++ , 2010 .

[29]  Steven Hand,et al.  The case for reconfigurable I/O channels , 2012 .

[30]  Sayantan Sur,et al.  Memcached Design on High Performance RDMA Capable Interconnects , 2011, 2011 International Conference on Parallel Processing.

[31]  Robbert van Renesse,et al.  FLIP: an internetwork protocol for supporting distributed systems , 1993, TOCS.

[32]  Dilma Da Silva,et al.  K42: building a complete operating system , 2006, EuroSys.

[33]  Trent Jaeger,et al.  Preventing Denial-of-Service Attalcks on a, p-Kernel for WebOSes , 1997 .

[34]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[35]  Nathalie Furmento,et al.  NewMadeleine: a Fast Communication Scheduling Engine for High Performance Networks , 2007 .

[36]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[37]  Robbert van Renesse,et al.  Amoeba A Distributed Operating System for the 1990 s Sape , 1990 .

[38]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[39]  Scott Pakin,et al.  Fast messages: efficient, portable communication for workstation clusters and MPPs , 1997, IEEE Concurrency.

[40]  Cristina L. Abad,et al.  An Analysis on the Schemes for Detecting and Preventing ARP Cache Poisoning Attacks , 2007, 27th International Conference on Distributed Computing Systems Workshops (ICDCSW'07).

[41]  Juan Touriño,et al.  Java Fast Sockets: Enabling high-speed Java communications on high performance clusters , 2008, Comput. Commun..

[42]  Ian Lumb,et al.  A Taxonomy and Survey of Cloud Computing Systems , 2009, 2009 Fifth International Joint Conference on INC, IMS and IDC.

[43]  Hakim Weatherspoon,et al.  Operating Systems Abstractions for Software Packet Processing in Datacenters , 2011 .

[44]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[45]  Alexandre Denis A High Performance Superpipeline Protocol for InfiniBand , 2011, Euro-Par.

[46]  Paul V. Mockapetris,et al.  Domain names - implementation and specification , 1987, RFC.

[47]  Wassim El-Hajj,et al.  ARP spoofing: a comparative study for education purposes , 2009 .