FinePar: Irregularity-aware fine-grained workload partitioning on integrated architectures
暂无分享,去创建一个
Wenguang Chen | Bingsheng He | Jidong Zhai | Bo Wu | Feng Zhang | Bingsheng He | Wenguang Chen | Jidong Zhai | Bo Wu | Feng Zhang
[1] Airfares 2002Q,et al. MULTIPLE LINEAR REGRESSION , 2006, Statistical Methods for Biomedical Research.
[2] Scott A. Mahlke,et al. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[3] Kenli Li,et al. Performance Analysis and Optimization for SpMV on GPU Using Probabilistic Modeling , 2015, IEEE Transactions on Parallel and Distributed Systems.
[4] Mark J. Warshawsky,et al. A Modern Approach , 2005 .
[5] R. Govindarajan,et al. Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices , 2014, CGO '14.
[6] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .
[7] K. Upton,et al. A modern approach , 1995 .
[8] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[9] Karthik Nilakant,et al. On the Efficacy of APUs for Heterogeneous Graph Computation , 2014 .
[10] Gagan Agrawal,et al. Accelerating MapReduce on a coupled CPU-GPU architecture , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] Ninghui Sun,et al. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication , 2013, PLDI.
[12] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[13] Srinivasan Parthasarathy,et al. Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Wenguang Chen,et al. To Co-run, or Not to Co-run: A Performance Study on Integrated Architectures , 2015, 2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.
[15] Michael F. P. O'Boyle,et al. Integrating profile-driven parallelism detection and machine-learning-based mapping , 2014, TACO.
[16] Bingsheng He,et al. Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture , 2013, Proc. VLDB Endow..
[17] Ching-Yung Lin,et al. GraphBIG: understanding graph computing in the context of industrial solutions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] Denis Barthou,et al. Automatic OpenCL Task Adaptation for Heterogeneous Architectures , 2016, Euro-Par.
[19] Joseph L. Greathouse,et al. Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[21] Mitesh R. Meswani,et al. Efficient breadth-first search on a heterogeneous processor , 2014, 2014 IEEE International Conference on Big Data (Big Data).
[22] Shengen Yan,et al. yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP.
[23] FaloutsosChristos,et al. Random walk with restart: fast solutions and applications , 2008 .
[24] Keshav Pingali,et al. Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[25] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[26] Matei Ripeanu,et al. A yoke of oxen and a thousand chickens for heavy lifting graph processing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[27] Wenguang Chen,et al. Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures , 2017, IEEE Transactions on Parallel and Distributed Systems.
[28] Weifeng Liu,et al. Parallel Transposition of Sparse Data Structures , 2016, ICS.
[29] Kurt Keutzer,et al. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.
[30] Ben Sander,et al. Applying AMD's Kaveri APU for heterogeneous computing , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).
[31] Rajkishore Barik,et al. Efficient Mapping of Irregular C++ Applications to Integrated GPUs , 2014, CGO '14.
[32] Srinivasan Parthasarathy,et al. Automatic Selection of Sparse Matrix Representation on GPUs , 2015, ICS.
[33] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[34] Tarek S. Abdelrahman,et al. Parallel Radix Sort on the AMD Fusion Accelerated Processing Unit , 2013, 2013 42nd International Conference on Parallel Processing.
[35] Brian W. Barrett,et al. Introducing the Graph 500 , 2010 .
[36] Christos Faloutsos,et al. Random walk with restart: fast solutions and applications , 2008, Knowledge and Information Systems.
[37] Michael F. P. O'Boyle,et al. Automatic Feature Generation for Machine Learning Based Optimizing Compilation , 2009, 2009 International Symposium on Code Generation and Optimization.
[38] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.
[39] Brian Vinter,et al. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.