Parallel real-time garbage collection of multiple heaps in reconfigurable hardware

Despite rapid increases in memory capacity, reconfigurable hardware is still programmed in a very low-level manner, generally without any dynamic allocation at all. This limits productivity especially as the larger chips encourage more and more complex designs to be attempted. Prior work has shown that it is possible to implement a real-time collector in hardware and achieve stall-free operation --- but at the price of severe restrictions on object layouts. We present the first hardware garbage collector capable of collecting multiple inter-connected heaps, thereby allowing a rich set of object types. We show that for a modest additional cost in logic and memory, we can support multiple heaps at a clock frequency competitive with monolithic, fixed-layout heaps. We evaluate the hardware design by synthesizing it for a Xilinx FPGA and using co-simulation to measure the run-time behavior over a set of four benchmarks. Even at high allocation and mutation rates the collector is able to sustain stall-free (100% minimum mutator utilization) operation with up to 4 inter-connected heaps, while only requiring between 1.1 and 1.7 times the maximum live memory of the application.

[1]  Michael Wolf,et al.  The pauseless GC algorithm , 2005, VEE '05.

[2]  Guy L. Steele Data Representations in PDP-10 MACLISP , 1977 .

[3]  Guy E. Blelloch,et al.  A parallel, real-time garbage collector , 2001, PLDI '01.

[4]  Brian Demsky,et al.  Locality-Aware Many-Core Garbage Collection , 2010 .

[5]  Martin Schoeberl,et al.  Nonblocking real-time garbage collection , 2010, TECS.

[6]  David A. Moon,et al.  Garbage collection in a large LISP system , 1984, LFP '84.

[7]  David A. Patterson,et al.  Architecture of SOAR: Smalltalk on a RISC , 1984, ISCA '84.

[8]  Satnam Singh,et al.  Designing hardware with dynamic memory abstraction , 2010, FPGA '10.

[9]  Graem A. Ringwood,et al.  Garbage collecting the Internet: a survey of distributed garbage collection , 1998, CSUR.

[10]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS.

[11]  Viktor Vafeiadis,et al.  Finding heap-bounds for hardware synthesis , 2009, 2009 Formal Methods in Computer-Aided Design.

[12]  S. L. Graham,et al.  List Processing in Real Time on a Serial Computer , 1978 .

[13]  Matthias Meyer,et al.  An on-chip garbage collection coprocessor for embedded real-time systems , 2005, 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'05).

[14]  Dirk Stroobandt,et al.  FPGA-aware garbage collection in Java , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[15]  David F. Bacon,et al.  And then there were none: a stall-free real-time garbage collector for reconfigurable hardware , 2012, PLDI.

[16]  Kelvin D. Nilsen,et al.  Performance of a hardware-assisted real-time garbage collector , 1994, ASPLOS VI.

[17]  Jason Helge Anderson,et al.  LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems , 2013, TECS.

[18]  Taiichi Yuasa,et al.  Real-time garbage collection on general-purpose machines , 1990, J. Syst. Softw..