SODA: Stencil with Optimized Dataflow Architecture
暂无分享,去创建一个
Jason Cong | Peng Wei | Peipei Zhou | Yuze Chi | Peipei Zhou | J. Cong | Yuze Chi | Peng Wei
[1] Xuan Yang,et al. Programming Heterogeneous Systems from an Image Processing DSL , 2016, ACM Trans. Archit. Code Optim..
[2] Eduard Ayguadé,et al. Exploiting memory customization in FPGA for 3D stencil computations , 2009, 2009 International Conference on Field-Programmable Technology.
[3] Alan C. Bovik,et al. The Essential Guide to Image Processing , 2009, J. Electronic Imaging.
[4] Marco D. Santambrogio,et al. A polyhedral model-based framework for dataflow implementation on FPGA devices of Iterative Stencil Loops , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[5] Tom Feist,et al. Vivado Design Suite , 2012 .
[6] Satoshi Matsuoka,et al. Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL , 2018, FPGA.
[7] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[8] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[10] Jürgen Teich,et al. Generating FPGA-based image processing accelerators with Hipacc: (Invited paper) , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[11] Jason Cong,et al. High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[12] Klaus Sutner,et al. Computation theory of cellular automata , 1998 .
[13] Yun Liang,et al. A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[14] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[15] J. Ramanujam,et al. A framework for enhancing data reuse via associative reordering , 2014, PLDI.
[16] Jason Cong,et al. Polyhedral-based data reuse optimization for configurable computing , 2013, FPGA '13.
[17] Nachiket Kapre,et al. Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.
[18] Satoshi Matsuoka,et al. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[19] Jason Cong,et al. An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[20] Frédo Durand,et al. Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..
[21] Greg Stitt,et al. Scalable Window Generation for the Intel Broadwell+Arria 10 and High-Bandwidth FPGA Systems , 2018, FPGA.
[22] Satoru Yamamoto,et al. Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth , 2014, IEEE Transactions on Parallel and Distributed Systems.
[23] Shih-Wei Liao,et al. Locality-Aware Scheduling for Stencil Code in Halide , 2016, 2016 45th International Conference on Parallel Processing Workshops (ICPPW).
[24] Jason Cong,et al. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis , 2012, DAC Design Automation Conference 2012.
[25] Mingjie Lin,et al. Graph-Theoretically Optimal Memory Banking for Stencil-Based Computing Kernels , 2018, FPGA.
[26] Gerhard Wellein,et al. Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory , 2009, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[27] Paul Feautrier,et al. Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.
[28] Pat Hanrahan,et al. Darkroom , 2014, ACM Trans. Graph..
[29] G. Roth,et al. Compiling Stencils in High Performance Fortran , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[30] Jason Cong,et al. An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[31] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[32] Jason Cong,et al. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[33] Jason Cong,et al. Latte: Locality Aware Transformation for High-Level Synthesis , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[34] Alan C. Bovik,et al. The Essential Guide to Video Processing , 2009, J. Electronic Imaging.