Leo: A Profile-Driven Dynamic Optimization Framework for GPU Applications
暂无分享,去创建一个
Karsten Schwan | Naila Farooqui | Yuan Yu | Christopher J. Rossbach | Yuan Yu | C. Rossbach | K. Schwan | N. Farooqui
[1] Teresa H. Y. Meng,et al. Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.
[2] Kunle Olukotun,et al. A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[3] Michael Isard,et al. Some sample programs written in DryadLINQ , 2009 .
[4] Sudhakar Yalamanchili,et al. A characterization and analysis of PTX kernels , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[5] Michael Stonebraker,et al. Implementation techniques for main memory database systems , 1984, SIGMOD '84.
[6] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[7] Bruce K. Grace. Black-Scholes option pricing via genetic algorithms , 2000 .
[8] Matthew Arnold,et al. A Survey of Adaptive Optimization in Virtual Machines , 2005, Proceedings of the IEEE.
[9] Kunle Olukotun,et al. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning , 2011, ICML.
[10] James Newsom,et al. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software, Network and Distributed System Security Symposium Conference Proceedings : 2005 , 2005 .
[11] Karsten Schwan,et al. Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[12] Kunle Olukotun,et al. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging , 2013, POPL.
[13] Eric Darve,et al. Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[14] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[15] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[16] Wen-mei W. Hwu,et al. DL: A data layout transformation system for heterogeneous computing , 2012, 2012 Innovative Parallel Computing (InPar).
[17] Sudhakar Yalamanchili,et al. Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[18] Joshua A. Anderson,et al. General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..
[19] Bo Wu,et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.
[20] Kurt Keutzer,et al. Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.
[21] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[22] Thomas Sangild Sørensen,et al. Real-time deformation of detailed geometry based on mappings to a less detailed physical simulation on the GPU , 2005, EGVE'05.
[23] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[24] Jean-Philippe Martin,et al. Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.
[25] Idit Keidar,et al. GPUfs: Integrating a file system with GPUs , 2013, TOCS.