Generating Configurable Hardware from Parallel Patterns
暂无分享,去创建一个
Kunle Olukotun | Christopher De Sa | Raghu Prabhakar | HyoukJoong Lee | Kevin J. Brown | David Koeplinger | Christos Kozyrakis | Kevin J. Brown | K. Olukotun | HyoukJoong Lee | D. Koeplinger | R. Prabhakar | Christos Kozyrakis
[1] Kunle Olukotun,et al. Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[2] Mingxing Tan,et al. ElasticFlow: A complexity-effective approach for pipelining irregular loop nests , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[3] Christian de Schryver,et al. FPGA Based Accelerators for Financial Applications , 2015 .
[4] Jinyang Li,et al. Spartan: A Distributed Array Framework with Smart Tiling , 2015, USENIX Annual Technical Conference.
[5] Mary W. Hall,et al. Loop and data transformations for sparse matrix code , 2015, PLDI.
[6] Eric S. Chung,et al. A reconfigurable fabric for accelerating large-scale datacenter services , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[7] M. Laurenzano,et al. Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.
[8] Karin Strauss,et al. Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .
[9] Kunle Olukotun,et al. Locality-Aware Mapping of Nested Parallel Patterns on GPUs , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[10] Udo Kebschull,et al. Biomedical image processing and reconstruction with dataflow computing on FPGAs , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).
[11] Kunle Olukotun,et al. Hardware system synthesis from Domain-Specific Languages , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).
[12] Yong Wang,et al. SDA: Software-defined accelerator for large-scale DNN systems , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).
[13] Feng Liu,et al. CGPA: Coarse-Grained Pipelined Accelerators , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).
[14] Kunle Olukotun,et al. Delite , 2014, ACM Trans. Embed. Comput. Syst..
[15] Kunle Olukotun,et al. Composition and Reuse with Compiled Domain-Specific Languages , 2013, ECOOP.
[16] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI.
[17] Dennis Shasha,et al. Locality Optimization for Data Parallel Programs , 2013, ArXiv.
[18] Jason Cong,et al. Polyhedral-based data reuse optimization for configurable computing , 2013, FPGA '13.
[19] David F. Bacon,et al. FPGA programming for the masses , 2013, CACM.
[20] Kunle Olukotun,et al. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging , 2013, POPL.
[21] Kevin J. Brown,et al. Optimizing data structures in high-level programs , 2013 .
[22] Christian Lengauer,et al. Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..
[23] John Wawrzynek,et al. Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.
[24] Oskar Mencer,et al. Finding the right level of abstraction for minimizing operational expenditure , 2011, WHPCF '11.
[25] Kunle Olukotun,et al. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning , 2011, ICML.
[26] Huseyin Seker,et al. FPGA implementation of K-means algorithm for bioinformatics application: An accelerated approach to clustering Microarray data , 2011, 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).
[27] Donald G. Bailey,et al. Design for Embedded Image Processing on FPGAs , 2011 .
[28] Jason Cong,et al. High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[29] Kurt Keutzer,et al. Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.
[30] Uday Bondhugula,et al. Loop transformations: convexity, pruning and optimization , 2011, POPL '11.
[31] Joshua S. Auerbach,et al. Lime: a Java-compatible and synthesizable language for heterogeneous architectures , 2010, OOPSLA.
[32] M. Zaharia,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[33] Craig Chambers,et al. FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.
[34] Albert Cohen,et al. The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.
[35] Viktor K. Prasanna,et al. High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware , 2008, IEEE Transactions on Computers.
[36] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[37] BastoulCédric,et al. Iterative optimization in the polyhedral model , 2008 .
[38] Satnam Singh,et al. Kiwi: Synthesis of FPGA Circuits from Parallel Programs , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.
[39] Jeff Mason,et al. CHiMPS: A C-level compilation flow for hybrid CPU-FPGA architectures , 2008, 2008 International Conference on Field Programmable Logic and Applications.
[40] Sadaf R. Alam,et al. Using FPGA Devices to Accelerate Biomolecular Simulations , 2007, Computer.
[41] Stephen A. Edwards,et al. The Challenges of Synthesizing Hardware from C-Like Languages , 2006, IEEE Design & Test of Computers.
[42] Wayne Luk,et al. Reconfigurable acceleration for Monte Carlo based financial simulation , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..
[43] Arvind. Bluespec: A language for hardware design, simulation, synthesis and verification Invited Talk , 2003, MEMOCODE.
[44] Randolph E. Harr,et al. Efficient pipelining of nested loops: unroll-and-squash , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[45] Jason Cong,et al. AutoPilot: A Platform-Based ESL Synthesis System , 2008 .
[46] Sanjay Ghemawat,et al. MapReduce: simplified data processing on large clusters , 2008, CACM.
[47] Mary W. Hall,et al. CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .
[48] Samuel M. Brown,et al. Performance Comparison of Finite-difference Modeling On Cell, FPGA And Multi-core Computers , 2007 .
[49] Sadaf R. Alam,et al. Scientific Computing Beyond CPUs: FPGA implementations of common scientific kernels , 2005 .
[50] Ralf Hinze,et al. Haskell 98 — A Non−strict‚ Purely Functional Language , 1999 .