CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores
暂无分享,去创建一个
Omer Khan | Masab Ahmad | Farrukh Hijaz | Qingchuan Shi | Masab Ahmad | O. Khan | Qingchuan Shi | Farrukh Hijaz
[1] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[2] David A. Bader,et al. Scalable and High Performance Betweenness Centrality on the GPU , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Omer Khan,et al. Efficient parallelization of path planning workload on single-chip shared-memory multicores , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).
[4] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[5] Steven A. Hofmeyr,et al. Load balancing on speed , 2010, PPoPP '10.
[6] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[7] Daniel J. Sorin,et al. Exploring memory consistency for massively-threaded throughput-oriented processors , 2013, ISCA.
[8] David A. Bader,et al. GTgraph : A Synthetic Graph Generator Suite , 2006 .
[9] George Kurian,et al. The locality-aware adaptive cache coherence protocol , 2013, ISCA.
[10] Chen Sun,et al. DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.
[11] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[12] David A. Bader,et al. Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[14] Keshav Pingali,et al. Deterministic galois: on-demand, portable and parameterless , 2014, ASPLOS.
[15] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.
[16] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[17] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[18] Jennifer Widom,et al. GPS: a graph processing system , 2013, SSDBM.
[19] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[20] Sudhakar Yalamanchili,et al. Characterization and analysis of dynamic parallelism in unstructured GPU applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).
[21] Jure Leskovec,et al. Discovering social circles in ego networks , 2012, ACM Trans. Knowl. Discov. Data.
[22] Bernard Gendron,et al. Parallel Branch-and-Branch Algorithms: Survey and Synthesis , 1994, Oper. Res..
[23] George Kurian,et al. Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[24] John R. Gilbert,et al. The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..
[25] Keshav Pingali,et al. Lonestar: A suite of parallel irregular programs , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[26] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[27] E BlellochGuy,et al. Internally deterministic parallel algorithms can be fast , 2012 .
[28] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[29] Jure Leskovec,et al. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..
[30] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[31] Chen Sun,et al. Cross-layer Energy and Performance Evaluation of a Nanophotonic Manycore Processor System Using Real Application Workloads , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[32] Peter Wilson,et al. Efficient parallel packet processing using a shared memory many-core processor with hardware support to accelerate communication , 2015, 2015 IEEE International Conference on Networking, Architecture and Storage (NAS).
[33] G. Edward Suh,et al. Application-aware deadlock-free oblivious routing , 2009, ISCA '09.
[34] Pradeep Dubey,et al. Navigating the maze of graph analytics frameworks using massive graph datasets , 2014, SIGMOD Conference.
[35] Anantharaman Kalyanaraman,et al. Parallel Heuristics for Scalable Community Detection , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[36] LeskovecJure,et al. Discovering social circles in ego networks , 2014 .
[37] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .
[38] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.
[39] Hee-Seok Kim,et al. Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[40] Thomas H. Cormen,et al. Introduction to algorithms [2nd ed.] , 2001 .
[41] Kevin Skadron,et al. Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).
[42] Kunle Olukotun,et al. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[43] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .