BLP: Applying ILP Techniques to Bytecode Execution

The popularity of Java has resulted in a flurry of engineering and research activity to improve performance of Java Virtual Machine (JVM) implementations. This paper introduces the concept of bytecode-level parallelism (BLP)--data- and control- independent bytecodes that can be executed concurrently--as a vehicle for achieving substantial performance improvements in implementations of JVMs, and describes a JVM architecture--JVM-BLP--that uses threads to exploit BLP. Measurements for several large Java programs show levels of BLP can be as high as 14.564 independent instructions, with an average of 6.768.

[1]  Marc Najork,et al.  Performance limitations of the Java core libraries , 1999, JAVA '99.

[2]  Yale N. Patt,et al.  Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.

[3]  S. McFarling Combining Branch Predictors , 1993 .

[4]  Margaret Martonosi,et al.  Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques , 1999, IEEE Trans. Computers.

[5]  Barton P. Miller,et al.  Performance measurement of dynamically compiled Java executions , 1999, JAVA '99.

[6]  John C. Gyllenhaal,et al.  Java bytecode to native code translation: the Caffeine prototype and preliminary results , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[7]  Lung-Chung Chang,et al.  Stack operations folding in Java processors , 1998 .

[8]  Ramesh Radhakrishnan,et al.  Allowing for ILP in an embedded Java processor , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9]  Ramesh Radhakrishnan,et al.  Characterization of Java applications at bytecode and ultra-SPARC machine code levels , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[10]  Thomas Kistler,et al.  WebL - A Programming Language for the Web , 1998, Comput. Networks.

[11]  Vivek Sarkar,et al.  The Jalape ~ no Dynamic Optimizing Compiler for Java TM , 1999 .

[12]  Frank Yellin,et al.  The Java Virtual Machine Specification , 1996 .

[13]  Nick Benton,et al.  Interlanguage working without tears: blending SML with Java , 1999, ICFP '99.

[14]  Ali-Reza Adl-Tabatabai,et al.  Fast, effective code generation in a just-in-time Java compiler , 1998, PLDI.

[15]  Todd A. Proebsting Optimizing an ANSI C interpreter with superoperators , 1995, POPL '95.

[16]  M. Anton Ertl,et al.  Stack caching for interpreters , 1995, PLDI '95.

[17]  Ioi K. Lam,et al.  Jacl: A Tcl Implementation in Java , 1997, Tcl/Tk Workshop.

[18]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[19]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[20]  Dean M. Tullsen,et al.  Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.

[21]  Nick Benton,et al.  Compiling standard ML to Java bytecodes , 1998, ICFP '98.

[22]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[23]  Kunle Olukotun,et al.  Improving the performance of speculatively parallel applications on the Hydra CMP , 1999 .

[24]  Keith Diefendorff,et al.  Power4 focuses on memory bandwidth , 1999 .

[25]  Manuel E. Benitez,et al.  The Advantages of Machine-Dependent Global Optimization , 1994, Programming Languages and System Architectures.