Characterization and analysis of dynamic parallelism in unstructured GPU applications
暂无分享,去创建一个
[1] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[2] David A. Bader,et al. Graph Partitioning and Graph Clustering, 10th DIMACS Implementation Challenge Workshop, Georgia Institute of Technology, Atlanta, GA, USA, February 13-14, 2012. Proceedings , 2013, Graph Partitioning and Graph Clustering.
[3] Yong Tang,et al. Gregex: GPU Based High Speed Regular Expression Matching Engine , 2011, 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.
[4] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[5] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[6] Kevin Skadron,et al. Dynamic warp subdivision for integrated branch and memory divergence tolerance , 2010, ISCA.
[7] Yi Yang,et al. CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications , 2015, Journal of Computer Science and Technology.
[8] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[9] Youcef Saad,et al. A Basic Tool Kit for Sparse Matrix Computations , 1990 .
[10] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[11] John McHugh,et al. Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.
[12] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[13] Keshav Pingali,et al. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm , 2011 .
[14] Yuni Xia,et al. GPU accelerated item-based collaborative filtering for big-data applications , 2013, 2013 IEEE International Conference on Big Data.
[15] Thomas Sangild Sørensen,et al. Real-time deformation of detailed geometry based on mappings to a less detailed physical simulation on the GPU , 2005, EGVE'05.
[16] Youcef Saad,et al. A Basic Tool Kit for Sparse Matrix Computations , 1990 .
[17] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[18] Joshua A. Anderson,et al. General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..
[19] Sudhakar Yalamanchili,et al. A characterization and analysis of PTX kernels , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[20] A L Kuhl,et al. Thermodynamic States in Explosion Fields , 2009 .
[21] David K. McAllister,et al. OptiX: a general purpose ray tracing engine , 2010, ACM Trans. Graph..
[22] Bruce K. Grace. Black-Scholes option pricing via genetic algorithms , 2000 .
[23] Fei Wang,et al. Graph-Based Substructure Pattern Mining Using CUDA Dynamic Parallelism , 2013, IDEAL.
[24] Sudhakar Yalamanchili,et al. Relational algorithms for multi-bulk-synchronous processors , 2013, PPoPP '13.
[25] Michela Taufer,et al. Performance impact of dynamic parallelism on different clustering algorithms , 2013, Defense, Security, and Sensing.
[26] Kevin Skadron,et al. Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).