An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
暂无分享,去创建一个
[1] Peter Zipf,et al. Optimization of Constant Matrix Multiplication with Low Power and High Throughput , 2017, IEEE Transactions on Computers.
[2] Wei Zhang,et al. FDR 2.0: A Low-Power Dynamically Reconfigurable Architecture and Its FinFET Implementation , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[3] Ray C. C. Cheung,et al. Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support , 2013, Microelectron. J..
[4] Jesús Grajal,et al. A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[5] Suhaib A. Fahmy,et al. Multipumping Flexible DSP Blocks for Resource Reduction on Xilinx FPGAs , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[6] Pasi Liljeberg,et al. NoC-AXI interface for FPGA-based MPSoC platforms , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[7] Viktor K. Prasanna,et al. High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs , 2007, IEEE Transactions on Parallel and Distributed Systems.
[8] Arnold Schönhage,et al. Schnelle Multiplikation großer Zahlen , 1971, Computing.
[9] Suhaib A. Fahmy,et al. Mapping for Maximum Performance on FPGA DSP Blocks , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[10] Martin Fürer,et al. Faster integer multiplication , 2007, STOC '07.
[11] Mário P. Véstias,et al. Parallel dot-products for deep learning on FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[12] Douglas L. Maskell,et al. The iDEA DSP Block-Based Soft Processor for FPGAs , 2014, TRETS.
[13] Inmaculada Pardines,et al. DSPONE48: A methodology for automatically synthesize HDL focus on the reuse of DSP slices , 2017, J. Parallel Distributed Comput..
[14] Jason Cong,et al. Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[15] Kenli Li,et al. A parallel computing method using blocked format with optimal partitioning for SpMV on GPU , 2018, J. Comput. Syst. Sci..
[16] Wei Zhang,et al. Fracturable DSP Block for Multi-context Reconfigurable Architectures , 2017, Circuits Syst. Signal Process..
[17] Vamsi Boppana,et al. A 16-nm Multiprocessing System-on-Chip Field-Programmable Gate Array Platform , 2016, IEEE Micro.
[18] Douglas L. Maskell,et al. Throughput oriented FPGA overlays using DSP blocks , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[19] Xin Zhou,et al. An Efficient Implementation of the Gradient-Based Hough Transform Using DSP Slices and Block RAMs on the FPGA , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[20] Dongdong Chen,et al. Area- and power-efficient iterative single/double-precision merged floating-point multiplier on FPGA , 2017, IET Comput. Digit. Tech..
[21] Viktor K. Prasanna,et al. Performance Modeling of Matrix Multiplication on 3D Memory Integrated FPGA , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[22] Weidong Wang,et al. HACO-F: An Accelerating HLS-Based Floating-Point Ant Colony Optimization Algorithm on FPGA , 2017 .
[23] Suhaib A. Fahmy,et al. Minimizing DSP block usage through multi-pumping , 2015, 2015 International Conference on Field Programmable Technology (FPT).
[24] Kentaro Sano,et al. FPGA-based Stream Computing for High-Performance N-Body Simulation using Floating-Point DSP Blocks , 2017, HEART.
[25] Satoru Yamamoto,et al. FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks , 2017, IEEE Transactions on Parallel and Distributed Systems.