Acceleration by Inline Cache for Memory-Intensive Algorithms on FPGA via High-Level Synthesis
暂无分享,去创建一个
[1] Ralph Wittig,et al. Performance and power of cache-based reconfigurable computing , 2009, FPGA '09.
[2] G Seliem Asmaa,et al. Parallel Smith-Waterman Algorithm Hardware Implementation for Ancestors and Offspring Gene Tracer , 2016 .
[3] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.
[4] George A. Constantinides,et al. MATCHUP: Memory Abstractions for Heap Manipulating Programs , 2015, FPGA.
[5] Basilio B. Fraguela,et al. Adaptive line placement with the set balancing cache , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Witold R. Rudnicki,et al. An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[7] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[8] MeredithMichael,et al. SystemCoDesigneran automatic ESL synthesis approach by design space exploration and behavioral synthesis for streaming applications , 2009 .
[9] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[10] Luciano Lavagno,et al. High Performance and Low Power Monte Carlo Methods to Option Pricing Models via High Level Design and Synthesis , 2016, 2016 European Modelling Symposium (EMS).
[11] Dionisios N. Pnevmatikatos,et al. FPGA implementation of a configurable cache/scratchpad memory with virtualized user-level RDMA capability , 2009, 2009 International Symposium on Systems, Architectures, Modeling, and Simulation.
[12] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.
[13] Michael Garland,et al. Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[14] Jason Cong,et al. An energy-efficient adaptive hybrid cache , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.
[15] Jiarong Tong,et al. A high utilization rate routing algorithm for modern FPGA , 2008, 2008 9th International Conference on Solid-State and Integrated-Circuit Technology.
[16] Hesham F. A. Hamed,et al. Parallel Smith-Waterman Algorithm Hardware Implementation for Ancestors and Offspring Gene Tracer , 2016, 2016 World Symposium on Computer Applications & Research (WSCAR).
[17] Luciano Lavagno,et al. Energy-efficient FPGA Implementation of the k-Nearest Neighbors Algorithm Using OpenCL , 2016, FedCSIS.
[18] Luciano Lavagno,et al. Efficient FPGA Implementation of OpenCL High-Performance Computing Applications via High-Level Synthesis , 2017, IEEE Access.
[19] John Wawrzynek,et al. Exploiting Memory-Level Parallelism in Reconfigurable Accelerators , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.
[20] J.-Y. Bouguet,et al. Pyramidal implementation of the lucas kanade feature tracker , 1999 .
[21] Joseph M. Lancaster,et al. A Banded Smith-Waterman FPGA Accelerator for Mercury BLASTP , 2007, 2007 International Conference on Field Programmable Logic and Applications.
[22] Pat Hanrahan,et al. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.
[23] Jason Helge Anderson,et al. Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.
[24] Michael Fingeroff,et al. High-Level Synthesis Blue Book , 2010 .
[25] Yale N. Patt,et al. The V-Way cache: demand-based associativity via global replacement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[26] George A. Constantinides,et al. Custom-sized caches in application-specific memory hierarchies , 2015, 2015 International Conference on Field Programmable Technology (FPT).
[27] Kermin Fleming,et al. Leap scratchpads: automatic memory and cache management for reconfigurable logic , 2010, FPGA '11.
[28] Christian Haubelt,et al. SystemCoDesigner—an automatic ESL synthesis approach by design space exploration and behavioral synthesis for streaming applications , 2009, TODE.
[29] Lesley Shannon,et al. Design Space Exploration of L1 Data Caches for FPGA-Based Multiprocessor Systems , 2015, FPGA.
[30] Salim Hariri,et al. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..
[31] Yong Dou,et al. 64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.