暂无分享,去创建一个
[1] Karthikeyan Sankaralingam,et al. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.
[2] Kunle Olukotun,et al. Generating Configurable Hardware from Parallel Patterns , 2015, International Conference on Architectural Support for Programming Languages and Operating Systems.
[3] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[4] Jian Weng,et al. Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign , 2018, PACT.
[5] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[6] Karthikeyan Sankaralingam,et al. A general constraint-centric scheduling framework for spatial architectures , 2013, PLDI.
[7] Seth Copen Goldstein,et al. Tartan: evaluating spatial computation for whole program execution , 2006, ASPLOS XII.
[8] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[9] T. Knight,et al. Pathfinder : A Negotiation-Based Performance-Driven Router for FPGAs , 2012 .
[10] A. Happonen,et al. DSP implementation of Cholesky decomposition , 2006, Joint IST Workshop on Mobile Future, 2006 and the Symposium on Trends in Communications. SympoTIC '06..
[11] Raghuraman Mudumbai,et al. On the Feasibility of Distributed Beamforming in Wireless Networks , 2007, IEEE Transactions on Wireless Communications.
[12] Yoav Etsion,et al. Single-graph multiple flows: Energy efficient design alternative for GPGPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[13] James C. Hoe,et al. CoRAM++: Supporting data-structure-specific memory interfaces for FPGA computing , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).
[14] C. Batten,et al. Using Intra-Core Loop-Task Accelerators to Improve the Productivity and Performance of Task-Based Parallel Programs , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[15] William J. Dally,et al. A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[16] Robert A. van de Geijn,et al. Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator , 2014, IEEE Transactions on Computers.
[17] Christopher Batten,et al. The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[18] Alec Roelke. RISC5: Implementing the RISC-V ISA in gem5 , 2017 .
[19] Kunle Olukotun,et al. Plasticine: A reconfigurable architecture for parallel patterns , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[20] Fadi J. Kurdahi,et al. MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.
[21] Yoav Etsion,et al. Inter-Thread Communication in Multithreaded, Reconfigurable Coarse-Grain Arrays , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[22] Håkan Johansson,et al. Polyphase Decomposition of Digital Fractional-Delay Filters , 2015, IEEE Signal Processing Letters.
[23] Ruijie Zhao. WLS design of centro-symmetric 2-D FIR filters using matrix iterative algorithm , 2015, 2015 IEEE International Conference on Digital Signal Processing (DSP).
[24] Karthikeyan Sankaralingam,et al. Stream-dataflow acceleration , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[25] Kenneth A. Ross,et al. Q100: the architecture and design of a database processing unit , 2014, ASPLOS.
[26] Karthikeyan Sankaralingam,et al. Pushing the Limits of Accelerator Efficiency While Retaining Programmability , 2017 .
[27] Seth Copen Goldstein,et al. PipeRench: a co/processor for streaming multimedia acceleration , 1999, ISCA.
[28] Robert A. van Engelen,et al. Efficient Symbolic Analysis for Optimizing Compilers , 2001, CC.
[29] Praveen Raghavan,et al. Energy-Efficient Communication Processors: Design and Implementation for Emerging Wireless Systems , 2013 .
[30] P. Glenn Gulak,et al. A low-complexity high-speed QR decomposition implementation for MIMO receivers , 2009, 2009 IEEE International Symposium on Circuits and Systems.
[31] Scott A. Mahlke,et al. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[32] Jason Cong,et al. A Fully Pipelined and Dynamically Composable Architecture of CGRA , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.
[33] Antonia Zhai,et al. Triggered instructions: a control paradigm for spatially-programmed architectures , 2013, ISCA.
[34] Lizy Kurian John,et al. Scaling to the end of silicon with EDGE architectures , 2004, Computer.
[35] F. Mintzer,et al. On half-band, third-band, and Nth-band FIR filters and their design , 1982 .
[36] Christoforos E. Kozyrakis,et al. Vector Lane Threading , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[37] Mingoo Seok,et al. Pipelining a Triggered Processing Element , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[38] Eduard Ayguadé,et al. Advanced Pattern based Memory Controller for FPGA based HPC applications , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).
[39] P. B. Darwood,et al. LMMSE chip equalisation for 3GPP WCDMA downlink receivers with channel coding , 2001, ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).
[40] Scott A. Mahlke,et al. Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[41] Cong Yan,et al. A scalable architecture for ordered parallelism , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[42] George Carayannis,et al. Speech enhancement from noise: A regenerative approach , 1991, Speech Commun..
[43] Ali Saidi,et al. The Reconfigurable Streaming Vector Processor (RSVP , 2003 .
[44] Rudy Lauwereins,et al. Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.
[45] Henry Hoffmann,et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.
[46] Yoav Etsion,et al. Control flow coalescing on a hybrid dataflow/von Neumann GPGPU , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[47] Andrew B. Kahng,et al. CACTI 7 , 2017, ACM Trans. Archit. Code Optim..
[48] Steven Swanson,et al. Instruction scheduling for a tiled dataflow architecture , 2006, ASPLOS XII.
[49] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[50] Gu-Yeon Wei,et al. Co-designing accelerators and SoC interfaces using gem5-Aladdin , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).