Co-optimisation of datapath and memory in outer loop pipelining

When targeting algorithms to FPGAs both the array to memory assignment and the selection of data reuse structures should be considered to maximise performance. In this work we present an integer linear programming formulation for the combined problem of array to memory assignment and data reuse selection. We include a number of cost functions to minimise during memory optimisation and show how these optimisations can be integrated into a loop pipelining framework to iteratively update the memory subsystem during scheduling. By co-optimising the datapath and memory subsystem we are able to produce near optimal (fastest) solutions, with an upper bound on the distance from the optimal. Our results show an average speedup of up to 4x over a non-optimised memory subsystem when integrated into an existing outer loop pipelining framework.

[1]  Francky Catthoor,et al.  Custom Memory Management Methodology , 1998, Springer US.

[2]  Christos-Savvas Bouganis,et al.  A scalable FPGA architecture for non-linear SVM training , 2008, 2008 International Conference on Field-Programmable Technology.

[3]  Tulika Mitra,et al.  Defining neighborhood relations for fast spatial-temporal partitioning of applications on reconfigurable architectures , 2008, 2008 International Conference on Field-Programmable Technology.

[4]  Hideharu Amano,et al.  Exploiting memory hierarchy for a Computational Fluid Dynamics accelerator on FPGAs , 2008, 2008 International Conference on Field-Programmable Technology.

[5]  S. M. Heemstra de Groot,et al.  A polynomial time algorithm for the computation of the iteration-period bound in recursive data flow graphs , 1992 .

[6]  Chak-Kuen Wong,et al.  An Algorithm to Compact a VLSI Symbolic Layout with Mixed Constraints , 1983, 20th Design Automation Conference Proceedings.

[7]  John D. C. Little,et al.  On model building , 1993 .

[8]  Jürgen Becker,et al.  Reducing latency times by accelerated routing mechanisms for an FPGA gateway in the automotive domain , 2008, 2008 International Conference on Field-Programmable Technology.

[9]  Yong C. Kim,et al.  A dynamically reconfigurable Field Programmable Gate Array hardware foundation for security applications , 2008, 2008 International Conference on Field-Programmable Technology.

[10]  Masahiko Yoshimoto,et al.  A low memory bandwidth Gaussian mixture model (GMM) processor for 20,000-word real-time speech recognition FPGA system , 2008, 2008 International Conference on Field-Programmable Technology.

[11]  Hui Shao,et al.  An area-efficient FPGA realisation of a codebook-based image compression method , 2008, 2008 International Conference on Field-Programmable Technology.

[12]  L. C. Thomas,et al.  Model Building in Mathematical Programming (2nd Edition) , 1986 .

[13]  Donald G. Bailey,et al.  Optimised single pass connected components analysis , 2008, 2008 International Conference on Field-Programmable Technology.

[14]  Katherine Compton,et al.  Balanced allocation of compute time in hardware-accelerated systems , 2008, 2008 International Conference on Field-Programmable Technology.

[15]  Peter Y. K. Cheung,et al.  Outer Loop Pipelining for Application Specific Datapaths in FPGAs , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  Qiang Liu,et al.  Automatic On-chip Memory Minimization for Data Reuse , 2007 .

[17]  Hongbo Rong,et al.  Single-dimension software pipelining for multi-dimensional loops , 2004 .

[18]  Wayne Luk,et al.  Optimizing coarse-grained units in floating point hybrid FPGA , 2008, 2008 International Conference on Field-Programmable Technology.

[19]  Moritoshi Yasunaga,et al.  An adaptive pattern recognition hardware with on-chip shift register-based partial reconfiguration , 2008, 2008 International Conference on Field-Programmable Technology.

[20]  Takashi Nishimura,et al.  Leakage power reduction for coarse grained dynamically reconfigurable processor arrays with fine grained Power Gating technique , 2008, 2008 International Conference on Field-Programmable Technology.

[21]  Guy Lemieux,et al.  PERG: A scalable FPGA-based pattern-matching engine with consolidated Bloomier filters , 2008, 2008 International Conference on Field-Programmable Technology.

[22]  Minoru Watanabe,et al.  Inversion/non-inversion zero-overhead dynamic optically reconfigurable gate array VLSI , 2008, 2008 International Conference on Field-Programmable Technology.

[23]  Lesley Shannon,et al.  A new flexible PR domain model to replace the fixed multi-PR region model for DPR systems , 2008, 2008 International Conference on Field-Programmable Technology.

[24]  H. P. Williams,et al.  Model Building in Mathematical Programming , 1979 .

[25]  Pedro C. Diniz,et al.  Memory Parallelism Using Custom Array Mapping to Heterogeneous Storage Structures , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[27]  J. W. Crouch,et al.  Creating digital fingerprints on commercial field programmable gate arrays , 2008, FPT 2008.

[28]  Wayne Luk,et al.  Memory access optimisation for reconfigurable systems , 2001 .

[29]  Kees A. Vissers,et al.  Optimized generation of data-path from C codes for FPGAs , 2005, Design, Automation and Test in Europe.

[30]  Pedro C. Diniz,et al.  Automatic synthesis of data storage and control structures for FPGA-based computing engines , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[31]  Maya Gokhale,et al.  Automatic allocation of arrays to memories in FPGA processors with multiple memory banks , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[32]  Ali Akoglu,et al.  Concurrent timing based and routability driven depopulation technique for FPGA packing , 2008, 2008 International Conference on Field-Programmable Technology.