Efficient Java RMI for parallel programming

Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation (RMI) provides a flexible kind of remote procedure call (RPC) that supports polymorphism. Sun's RMI implementation achieves this kind of flexibility at the cost of a major runtime overhead. The goal of this article is to show that RMI can be implemented efficiently, while still supporting polymorphism and allowing interoperability with Java Virtual Machines (JVMs). We study a new approach for implementing RMI, using a compiler-based Java system called Manta. Manta uses a native (static) compiler instead of a just-in-time compiler. To implement RMI efficiently, Manta exploits compile-time type information for generating specialized serializers. Also, it uses an efficient RMI protocol and fast low-level communication protocols.A difficult problem with this approach is how to support polymorphism and interoperability. One of the consequences of polymorphism is that an RMI implementation must be able to download remote classes into an application during runtime. Manta solves this problem by using a dynamic bytecode compiler, which is capable of compiling and linking bytecode into a running application. To allow interoperability with JVMs, Manta also implements the Sun RMI protocol (i.e., the standard RMI protocol), in addition to its own protocol.We evaluate the performance of Manta using benchmarks and applications that run on a 32-node Myrinet cluster. The time for a null-RMI (without parameters or a return value) of Manta is 35 times lower than for the Sun JDK 1.2, and only slightly higher than for a C-based RPC protocol. This high performance is accomplished by pushing almost all of the runtime overhead of RMI to compile time. We study the performance differences between the Manta and the Sun RMI protocols in detail. The poor performance of the Sun RMI protocol is in part due to an inefficient implementation of the protocol. To allow a fair comparison, we compiled the applications and the Sun RMI protocol with the native Manta compiler. The results show that Manta's null-RMI latency is still eight times lower than for the compiled Sun RMI protocol and that Manta's efficient RMI protocol results in 1.8 to 3.4 times higher speedups for four out of six applications.

[1]  Philip J. Hatcher,et al.  Compiling Multithreaded Java Bytecode for Distributed Execution (Distinguished Paper) , 2000, Euro-Par.

[2]  Satoshi Hirano,et al.  Performance evaluation of popular distributed object technologies for Java , 1998 .

[3]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[4]  Thorsten von Eicken,et al.  Interfacing Java to the virtual interface architecture , 1999, JAVA '99.

[5]  Vijay Karamcheti,et al.  Object views: language support for intelligent object caching in parallel and distributed computations , 1999, OOPSLA '99.

[6]  Willy Zwaenepoel,et al.  The peregrine high‐performance RPC system , 1993, Softw. Pract. Exp..

[7]  Chris J. Scheiman,et al.  SuperWeb: research issues in Java-based global computing , 1997, Concurr. Pract. Exp..

[8]  Robert Grimm,et al.  Application performance and flexibility on exokernel systems , 1997, SOSP.

[9]  Andreas Krall,et al.  CACAO - A 64-bit JavaVM Just-in-Time Compiler , 1997, Concurr. Pract. Exp..

[10]  Henri E. Bal,et al.  Source-level global optimizations for fine-grain distributed shared memory systems , 2001, PPoPP '01.

[11]  Wilson C. Hsieh,et al.  Optimistic active messages: a mechanism for scheduling communication with computation , 1995, PPOPP '95.

[12]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[13]  Philip J. Hatcher,et al.  The Hyperion system: Compiling multithreaded Java bytecode for distributed execution , 2001, Parallel Comput..

[14]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[15]  Arjan J. C. van Gemund,et al.  Spar: A Programming Language for Semi-Automatic Compilation of Parallel Programs , 1997, Concurr. Pract. Exp..

[16]  Vaidy S. Sunderam,et al.  IceT: Distributed Computing and Java , 1997, Concurr. Pract. Exp..

[17]  Jason Maassen,et al.  An efficient implementation of Java's remote method invocation , 1999, PPoPP '99.

[18]  Chris J. Scheiman,et al.  SuperWeb: research issues in Java‐based global computing , 1997 .

[19]  JavaPascale Launay,et al.  The Do ! project : distributed programming using , 1998 .

[20]  Robbert van Renesse,et al.  The performance of the Amoeba distributed operating system , 1989, Softw. Pract. Exp..

[21]  George F. Riley,et al.  Efficient Implementation of Java Remote Method Invocation (RMI) , 1998, COOTS.

[22]  David E. Culler,et al.  Jaguar: enabling efficient communication and I/O in Java , 2000, Concurr. Pract. Exp..

[23]  Monica S. Lam,et al.  Jade: a high-level, machine-independent language for parallel programming , 1993, Computer.

[24]  Margo I. Seltzer,et al.  Operating system benchmarking in the wake of lmbench: a case study of the performance of NetBSD on the Intel x86 architecture , 1997, SIGMETRICS '97.

[25]  Dennis Gannon,et al.  Java RMI performance and object model interoperability: experiments with Java/HPC++ , 1998, Concurr. Pract. Exp..

[26]  Rutger F. H. Hofman,et al.  Evaluating design alternatives for reliable communication on high-speed networks , 2000, ASPLOS IX.

[27]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[28]  Mark J. Clement,et al.  Design issues for efficient implementation of MPI in Java , 1999, JAVA '99.

[29]  Vaidy S. Sunderam,et al.  IceT: Distributed Computing and Java , 1997, Concurr. Pract. Exp..

[30]  Alan L. Cox,et al.  Java/DSM: A Platform for Heterogeneous Computing , 1997, Concurr. Pract. Exp..

[31]  Larry L. Peterson,et al.  Making paths explicit in the Scout operating system , 1996, OSDI '96.

[32]  Eric A. Brewer,et al.  ATLAS: an infrastructure for global computing , 1996, EW 7.

[33]  Henry M. Levy,et al.  Limits to low-latency communication on high-speed networks , 1993, TOCS.

[34]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[35]  Michael Burrows,et al.  Performance of Firefly RPC , 1989, SOSP '89.

[36]  Michael Philippsen,et al.  More efficient serialization and RMI for Java , 2000, Concurr. Pract. Exp..

[37]  Michael Factor,et al.  cJVM: a single system image of a JVM on a cluster , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[38]  Michael Burrows,et al.  Performance of Firefly RPC , 1990, ACM Trans. Comput. Syst..

[39]  Vladimir Getov,et al.  High-Performance Parallel Programming in Java: Exploiting Native Libraries , 1998, Concurr. Pract. Exp..

[40]  Andreas Krall,et al.  CACAO — A 64‐bit JavaVM just‐in‐time compiler , 1997 .

[41]  Jim Waldo Remote procedure calls and Java Remote Method Invocation , 1998, IEEE Concurr..

[42]  Jason Maassen,et al.  Efficient replicated method invocation in Java , 2000, JAVA '00.

[43]  Arjan J. C. van Gemund,et al.  Spar: A programming language for semi‐automatic compilation of parallel programs , 1997 .

[44]  Kees Verstoep,et al.  Performance of a High-Level Parallel Language on a High-Speed Network , 1997, J. Parallel Distributed Comput..

[45]  Larry L. Peterson,et al.  RPC in the x-Kernel: evaluating new design techniques , 1989, SOSP '89.

[46]  Satoshi Hirano,et al.  Performance evaluation of popular distributed object technologies for Java , 1998, Concurr. Pract. Exp..

[47]  V. Karamcheti,et al.  Concert-efficient runtime support for concurrent object-oriented programming languages on stock hardware , 1993, Supercomputing '93.

[48]  Henri E. Bal,et al.  Performance evaluation of the Orca shared-object system , 1998, TOCS.

[49]  Vladimir Getov,et al.  MPI and Java-MPI: Contrasts and Comparisons of Low-Level Communication Performance , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[50]  Philip J. Hatcher,et al.  Executing Java threads in parallel in a distributed-memory environment , 1998, CASCON.

[51]  Geoffrey C. Fox,et al.  Object serialization for marshaling data in a Java interface to MPI , 2000, Concurr. Pract. Exp..

[52]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[53]  Brian N. Bershad,et al.  Extensibility safety and performance in the SPIN operating system , 1995, SOSP.

[54]  Geoffrey C. Fox,et al.  Object serialization for marshalling data in a Java interface to MPI , 1999, JAVA '99.

[55]  Katherine Yelick,et al.  Titanium: a high-performance Java dialect , 1998 .

[56]  Dennis Gannon,et al.  Java RMI performance and object model interoperability: experiments with Java/HPC++ , 1998 .

[57]  David E. Culler,et al.  High-performance local area communication with fast sockets , 1997 .

[58]  Torsten Suel,et al.  Highly Portable and Efficient Implementations of Parallel Adaptive N-Body Methods , 1997, SC.

[59]  Thorsten von Eicken,et al.  A Software Architecture for Zero-Copy RPC in Java , 1998 .

[60]  Henri E. Bal,et al.  Runtime optimizations for a Java DSM implementation , 2001, JGI '01.

[61]  Tim Brecht,et al.  Ajents: towards an environment for parallel, distributed and mobile Java applications , 1999, JAVA '99.

[62]  Henri E. Bal,et al.  Models for asynchronous message handling , 1997, IEEE Concurrency.

[63]  David E. Culler,et al.  Jaguar: enabling efficient communication and I/O in Java , 2000 .

[64]  John H. Hartman,et al.  Toba: Java for Applications - A Way Ahead of Time (WAT) Compiler , 1997, COOTS.

[65]  Vladimir Getov,et al.  High-performance parallel programming in Java: exploiting native libraries , 1998 .

[66]  Alan L. Cox,et al.  Runtime Support for Distributed Sharing in Strongly-Typed Languages , 1999 .

[67]  Peter R. Cappello,et al.  Javelin: Internet-based Parallel Computing using Java , 1997, Concurr. Pract. Exp..

[68]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[69]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[70]  Michael Philippsen,et al.  JavaParty - Transparent Remote Objects in Java , 1997, Concurr. Pract. Exp..

[71]  Michael Philippsen,et al.  A more efficient RMI for Java , 1999, JAVA '99.

[72]  Alan L. Cox,et al.  Java/DSM: A platform for heterogeneous computing , 1997 .

[73]  Kirk L. Johnson,et al.  CRL: high-performance all-software distributed shared memory , 1995, SOSP.