A Hybrid Systolic-Dataflow Architecture for Inductive Matrix Algorithms
暂无分享,去创建一个
Jian Weng | Tony Nowatzki | Sihao Liu | Zhengrong Wang | Vidushi Dadu | Jian Weng | Tony Nowatzki | Zhengrong Wang | Sihao Liu | Vidushi Dadu
[1] Shanzhi Chen,et al. The requirements, challenges, and technologies for 5G of terrestrial mobile telecommunication , 2014, IEEE Communications Magazine.
[2] Kenneth A. Ross,et al. Q100: the architecture and design of a database processing unit , 2014, ASPLOS.
[3] E.A. Lee,et al. Synchronous data flow , 1987, Proceedings of the IEEE.
[4] P. B. Darwood,et al. LMMSE chip equalisation for 3GPP WCDMA downlink receivers with channel coding , 2001, ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).
[5] André DeHon,et al. MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.
[6] Ali Saidi,et al. The Reconfigurable Streaming Vector Processor (RSVP , 2003 .
[7] Alec Roelke. RISC5: Implementing the RISC-V ISA in gem5 , 2017 .
[8] Scott A. Mahlke,et al. CGRA express: accelerating execution using dynamic operation fusion , 2009, CASES '09.
[9] Rudy Lauwereins,et al. Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.
[10] Jason Cong,et al. A Fully Pipelined and Dynamically Composable Architecture of CGRA , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.
[11] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[12] Rudy Lauwereins,et al. ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.
[13] William J. Dally,et al. A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[14] George Carayannis,et al. Speech enhancement from noise: A regenerative approach , 1991, Speech Commun..
[15] Leibo Liu,et al. Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures , 2016, IEEE Transactions on Parallel and Distributed Systems.
[16] C. Batten,et al. Using Intra-Core Loop-Task Accelerators to Improve the Productivity and Performance of Task-Based Parallel Programs , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[17] Robert A. van de Geijn,et al. Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator , 2014, IEEE Transactions on Computers.
[18] Christopher Batten,et al. The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[19] Kunle Olukotun,et al. Plasticine: A reconfigurable architecture for parallel patterns , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[20] Fadi J. Kurdahi,et al. MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.
[21] Kunle Olukotun,et al. Generating Configurable Hardware from Parallel Patterns , 2015, ASPLOS.
[22] Yoav Etsion,et al. Inter-Thread Communication in Multithreaded, Reconfigurable Coarse-Grain Arrays , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[23] Praveen Raghavan,et al. Energy-Efficient Communication Processors: Design and Implementation for Emerging Wireless Systems , 2013 .
[24] P. Glenn Gulak,et al. A low-complexity high-speed QR decomposition implementation for MIMO receivers , 2009, 2009 IEEE International Symposium on Circuits and Systems.
[25] Scott A. Mahlke,et al. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[26] Håkan Johansson,et al. Polyphase Decomposition of Digital Fractional-Delay Filters , 2015, IEEE Signal Processing Letters.
[27] Ruijie Zhao. WLS design of centro-symmetric 2-D FIR filters using matrix iterative algorithm , 2015, 2015 IEEE International Conference on Digital Signal Processing (DSP).
[28] Karthikeyan Sankaralingam,et al. Stream-dataflow acceleration , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[29] Robert A. van Engelen,et al. Efficient Symbolic Analysis for Optimizing Compilers , 2001, CC.
[30] Karthikeyan Sankaralingam,et al. Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[31] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[32] Kunle Olukotun,et al. REMARC : Reconfigurable Multimedia Array Coprocessor , 1999 .
[33] Yu Peng,et al. Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[34] Lizy Kurian John,et al. Scaling to the end of silicon with EDGE architectures , 2004, Computer.
[35] Aviral Shrivastava,et al. REGIMap: Register-aware application mapping on Coarse-Grained Reconfigurable Architectures (CGRAs) , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[36] Seth Copen Goldstein,et al. Spatial computation , 2004, ASPLOS XI.
[37] F. Mintzer,et al. On half-band, third-band, and Nth-band FIR filters and their design , 1982 .
[38] Arvind,et al. Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.
[39] A. Happonen,et al. DSP implementation of Cholesky decomposition , 2006, Joint IST Workshop on Mobile Future, 2006 and the Symposium on Trends in Communications. SympoTIC '06..
[40] H. T. Kung,et al. The Warp Computer: Architecture, Implementation, and Performance , 1987, IEEE Transactions on Computers.
[41] Henry Hoffmann,et al. A stream compiler for communication-exposed architectures , 2002, ASPLOS X.
[42] C. Nicol. A Coarse Grain Reconfigurable Array ( CGRA ) for Statically Scheduled Data Flow Computing , 2017 .
[43] Karthikeyan Sankaralingam,et al. A general constraint-centric scheduling framework for spatial architectures , 2013, PLDI.
[44] Seth Copen Goldstein,et al. PipeRench: a co/processor for streaming multimedia acceleration , 1999, ISCA.
[45] Christopher Batten,et al. Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators , 2013, ACM Trans. Comput. Syst..
[46] Yoav Etsion,et al. Control flow coalescing on a hybrid dataflow/von Neumann GPGPU , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[47] Karthikeyan Sankaralingam,et al. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.
[48] Jian Weng,et al. Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign , 2018, PACT.
[49] Steven Swanson,et al. Instruction scheduling for a tiled dataflow architecture , 2006, ASPLOS XII.
[50] Seth Copen Goldstein,et al. Dataflow: A Complement to Superscalar , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[51] Antonia Zhai,et al. Triggered instructions: a control paradigm for spatially-programmed architectures , 2013, ISCA.
[52] Carl Ebeling,et al. PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs , 1995, Third International ACM Symposium on Field-Programmable Gate Arrays.
[53] Raghuraman Mudumbai,et al. On the Feasibility of Distributed Beamforming in Wireless Networks , 2007, IEEE Transactions on Wireless Communications.
[54] Karthikeyan Sankaralingam,et al. Dataflow Predication , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[55] Tony Nowatzki,et al. Stream-based Memory Access Specialization for General Purpose Processors , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[56] Petar Popovski,et al. The METIS 5G System Concept: Meeting the 5G Requirements , 2016, IEEE Communications Magazine.
[57] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[58] Seth Copen Goldstein,et al. Tartan: evaluating spatial computation for whole program execution , 2006, ASPLOS XII.
[59] Yoav Etsion,et al. Single-graph multiple flows: Energy efficient design alternative for GPGPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[60] James C. Hoe,et al. CoRAM++: Supporting data-structure-specific memory interfaces for FPGA computing , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).
[61] Edward A. Lee,et al. Hierarchical finite state machines with multiple concurrency models , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[62] Christoforos E. Kozyrakis,et al. Vector Lane Threading , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[63] Mingoo Seok,et al. Pipelining a Triggered Processing Element , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[64] Eduard Ayguadé,et al. Advanced Pattern based Memory Controller for FPGA based HPC applications , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).
[65] Jongeun Lee,et al. Flattening-based mapping of imperfect loop nests for CGRAs? , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).