LMC: Automatic Resource-Aware Program-Optimized Memory Partitioning

As FPGAs have grown in size and capacity, FPGA memory systems have become both richer and more diverse in order to support the increased computational capacity of FPGA fabrics. Using these resources, and using them well, has become commensurately more difficult, especially in the context of legacy designs ported from smaller, simpler FPGA systems. This growing complexity necessitates resource-aware compilers that can make good use of memory resources on behalf of the programmer. In this work, we introduce the LEAP Memory Compiler (LMC), which can synthesize application-optimized cache networks for systems with multiple memory resources, enabling user programs to automatically take advantage of the expanded memory capabilities of modern FPGA systems. In our experiments, the optimized cache network achieves up to 49% performance gains for throughput-oriented applications and 15% performance gains for latency-oriented applications, while increasing design area by less than 6% of the total chip area.

[1]  Jason Helge Anderson,et al.  Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[2]  George A. Constantinides,et al.  Separation Logic-Assisted Code Transformations for Efficient High-Level Synthesis , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[3]  Asif Khan,et al.  High-throughput Pipelined Mergesort , 2008, 2008 6th ACM/IEEE International Conference on Formal Methods and Models for Co-Design.

[4]  Andreas Koch,et al.  MARC II: A parametrized speculative multi-ported memory subsystem for reconfigurable computers , 2011, 2011 Design, Automation & Test in Europe.

[5]  James C. Hoe,et al.  CoRAM: an in-fabric memory architecture for FPGA-based computing , 2011, FPGA '11.

[6]  Fred G. Gustavson,et al.  Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.

[7]  George A. Constantinides,et al.  MATCHUP: Memory Abstractions for Heap Manipulating Programs , 2015, FPGA.

[8]  James C. Hoe,et al.  CoRAM : An In-Fabric Memory Abstraction for FPGA-Based Computing , 2010 .

[9]  Carlo Caini,et al.  Analysis of TCP and DTN Retransmission Algorithms in Presence of Channel Disruptions , 2009, 2009 First International Conference on Advances in Satellite and Space Communications.

[10]  Jason Cong,et al.  High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Paul Chow,et al.  FCache: a system for cache coherent processing on FPGAs , 2012, FPGA '12.

[12]  Jürgen Becker,et al.  Adaptive Multi-client Network-on-Chip Memory , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.

[13]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Jürgen Becker,et al.  Adaptive Multiclient Network-on-Chip Memory Core: Hardware Architecture, Software Abstraction Layer, and Application Exploration , 2012, Int. J. Reconfigurable Comput..

[15]  Donald G. Bailey,et al.  Adaptive Dynamic On-chip Memory Management for FPGA-based reconfigurable architectures , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[16]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[17]  Jason Helge Anderson,et al.  LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems , 2013, TECS.

[18]  Michael Stumm,et al.  Cache consistency in hierarchical-ring-based multiprocessors , 1992, Proceedings Supercomputing '92.

[19]  Lesley Shannon,et al.  Design Space Exploration of L1 Data Caches for FPGA-Based Multiprocessor Systems , 2015, FPGA.

[20]  Adrian Park,et al.  Designing Modular Hardware Accelerators in C with ROCCC 2.0 , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[21]  Eric Williams,et al.  Performance optimizations, implementation, and verification of the SGI Challenge multiprocessor , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[22]  Kermin Fleming,et al.  The LEAP FPGA Operating System , 2016, FPGAs for Software Programmers.

[23]  Arvind,et al.  Leveraging latency-insensitivity to ease multiple FPGA design , 2012, FPGA '12.

[24]  Kermin Fleming,et al.  Leap scratchpads: automatic memory and cache management for reconfigurable logic , 2010, FPGA '11.

[25]  Alan D. George,et al.  VirtualRC: a virtual FPGA platform for applications and tools portability , 2012, FPGA '12.

[26]  Lesley Shannon,et al.  Polyblaze: From one to many bringing the microblaze into the multicore era with Linux SMP support , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[27]  Kermin Fleming,et al.  Scalable reconfigurable computing leveraging latency-insensitive channels , 2013 .

[28]  Kermin Fleming,et al.  LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.