Hardware accelerator design for data centers
暂无分享,去创建一个
Ozcan Ozturk | Steven M. Burns | Muhammet Mustafa Ozdal | Serif Yesil | Taemin Kim | Andrey Ayupov | O. Ozturk | S. Burns | A. Ayupov | Taemin Kim | Serif Yesil
[1] James C. Hoe,et al. GraphGen: An FPGA Framework for Vertex-Centric Graph Computation , 2014, FCCM 2014.
[2] Jason Cong,et al. Optimization of interconnects between accelerators and shared memories in dark silicon , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[3] Hideharu Amano,et al. A Performance Evaluation of CUBE: One-Dimensional 512 FPGA Cluster , 2010, ARC.
[4] Luka Daoud,et al. A Survey of High Level Synthesis Languages, Tools, and Compilers for Reconfigurable High Performance Computing , 2013, ICSS.
[5] Karthikeyan Sankaralingam,et al. Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[6] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[7] Jason Cong,et al. CHARM: a composable heterogeneous accelerator-rich microprocessor , 2012, ISLPED '12.
[8] Mikko H. Lipasti,et al. BenchNN: On the broad potential application scope of hardware neural network accelerators , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[9] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[10] John R. Gilbert,et al. The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..
[11] Rob A. Rutenbar,et al. FPGA acceleration of Markov Random Field TRW-S inference for stereo matching , 2013, 2013 Eleventh ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2013).
[12] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[13] Scott A. Mahlke,et al. Polymorphic Pipeline Array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] W. Luk,et al. Axel: a heterogeneous cluster with FPGAs and GPUs , 2010, FPGA '10.
[15] Michael Bedford Taylor,et al. Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.
[16] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[17] Scott A. Mahlke,et al. VEAL: Virtualized Execution Accelerator for Loops , 2008, 2008 International Symposium on Computer Architecture.
[18] Christopher Batten,et al. Accelerating Irregular Algorithms on GPGPUs Using Fine-Grain Hardware Worklists , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[19] Jasmine Novak,et al. PageRank Computation and the Structure of the Web: Experiments and Algorithms , 2002 .
[20] J.M. Perez,et al. High memory throughput FPGA architecture for high-definition Belief-Propagation stereo matching , 2009, 2009 3rd International Conference on Signals, Circuits and Systems (SCS).
[21] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[22] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[23] Jianlong Zhong,et al. Medusa: A Parallel Graph Processing System on Graphics Processors , 2014, SGMD.
[24] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[25] MutluOnur,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015 .
[26] Gu-Yeon Wei,et al. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[27] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[28] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[29] James C. Hoe,et al. GraphGen: An FPGA Framework for Vertex-Centric Graph Computation , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.
[30] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[31] Séamas McGettrick,et al. An FPGA architecture for the Pagerank eigenvector problem , 2008, 2008 International Conference on Field Programmable Logic and Applications.
[32] Paul Chow,et al. ZCluster: A Zynq-based Hadoop cluster , 2013, 2013 International Conference on Field-Programmable Technology (FPT).
[33] Monica S. Lam,et al. SociaLite: Datalog extensions for efficient social network analysis , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[34] Luca P. Carloni,et al. An analysis of accelerator coupling in heterogeneous architectures , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[35] Feng Liu,et al. CGPA: Coarse-Grained Pipelined Accelerators , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).
[36] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[37] Bingsheng He,et al. Parallel Data Mining on Graphics Processors , 2011 .
[38] Yu Wang,et al. FPMR: MapReduce framework on FPGA , 2010, FPGA '10.
[39] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[40] Tsutomu Yoshinaga,et al. An FPGA-Based Tightly Coupled Accelerator for Data-Intensive Applications , 2014, 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs.
[41] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[42] Nachiket Kapre,et al. GraphStep: A System Architecture for Sparse-Graph Algorithms , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[43] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).