Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures
暂无分享,去创建一个
Wenguang Chen | Bingsheng He | Jidong Zhai | Feng Zhang | Shuhao Zhang | Bingsheng He | Wenguang Chen | Shuhao Zhang | Jidong Zhai | Feng Zhang
[1] Mike O'Connor,et al. Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[2] Huazhong Yang,et al. Simultaneous Accelerator Parallelization and Point-to-Point Interconnect Insertion for Bus-Based Embedded SoCs , 2015 .
[3] Fei He,et al. Deadlock detection in FPGA design: A practical approach , 2015 .
[4] Tarek S. Abdelrahman,et al. Parallel Radix Sort on the AMD Fusion Accelerated Processing Unit , 2013, 2013 42nd International Conference on Parallel Processing.
[5] Bingsheng He,et al. In-Cache Query Co-Processing on Coupled CPU-GPU Architectures , 2014, Proc. VLDB Endow..
[6] Maurice Steinman,et al. AMD'S "LLANO" Fusion APU , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[7] Wu-chun Feng,et al. Performance characterization of data-intensive kernels on AMD Fusion architectures , 2012, Computer Science - Research and Development.
[8] David Kaeli,et al. Heterogeneous Computing with OpenCL 2.0 , 2015 .
[9] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[10] Rajkishore Barik,et al. Efficient Mapping of Irregular C++ Applications to Integrated GPUs , 2014, CGO '14.
[11] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[12] Michael F. P. O'Boyle,et al. Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[13] Mayank Daga,et al. Exploiting Coarse-Grained Parallelism in B+ Tree Searches on an APU , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[14] Keshav Pingali,et al. Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[15] Dominik Grewe,et al. Mapping parallel programs to heterogeneous multi-core systems , 2014 .
[16] Ben Sander,et al. Applying AMD's Kaveri APU for heterogeneous computing , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).
[17] Wu-chun Feng,et al. On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.
[18] Zhen Lin,et al. Implementation and evaluation of deep neural networks (DNN) on mainstream heterogeneous systems , 2014, APSys.
[19] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .
[20] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[21] David A. Wood,et al. Heterogeneous system coherence for integrated CPU-GPU systems , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[22] Bingsheng He,et al. Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture , 2013, Proc. VLDB Endow..
[23] Mitesh R. Meswani,et al. Efficient breadth-first search on a heterogeneous processor , 2014, 2014 IEEE International Conference on Big Data (Big Data).
[24] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[25] Michael F. P. O'Boyle,et al. A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.
[26] Gagan Agrawal,et al. Accelerating MapReduce on a coupled CPU-GPU architecture , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] Henri Calandra,et al. Hybrid strategy for stencil computations on the APU , 2014 .
[28] Wenguang Chen,et al. To Co-run, or Not to Co-run: A Performance Study on Integrated Architectures , 2015, 2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.
[29] Dong Li,et al. The tradeoffs of fused memory hierarchies in heterogeneous computing architectures , 2012, CF '12.
[30] Parimala Thulasiraman,et al. Designing APU Oriented Scientific Computing Applications in OpenCL , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.
[31] Bingsheng He,et al. OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures , 2013, Proc. VLDB Endow..
[32] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .
[33] Li Shen,et al. Understanding Co-run Degradations on Integrated Heterogeneous Processors , 2014, LCPC.
[34] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).