Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors
暂无分享,去创建一个
Yi Yang | Srimat T. Chakradhar | Tao Bao | Nishkam Ravi | S. Chakradhar | Tao Bao | N. Ravi | Yi Yang
[1] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[2] Surendra Byna,et al. Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory , 2010, SPAA '10.
[3] Albert Cohen,et al. Induction Variable Analysis with Delayed Abstractions , 2005, HiPEAC.
[4] François Irigoin,et al. Interprocedural Array Region Analyses , 1995, Int. J. Parallel Program..
[5] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[6] Edward T. Grochowski,et al. Larrabee: A Many-Core x86 Architecture for Visual Computing , 2008, IEEE Micro.
[7] Steven S. Lumetta,et al. CIGAR: Application Partitioning for a CPU/Coprocessor Architecture , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[8] Gary A. Kildall,et al. A unified approach to global program optimization , 1973, POPL.
[9] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[10] William H. Harrison,et al. Compiler Analysis of the Value Ranges for Variables , 1977, IEEE Transactions on Software Engineering.
[11] Rudolf Eigenmann,et al. A hybrid approach of OpenMP for clusters , 2012, PPoPP '12.
[12] Martin C. Rinard,et al. Symbolic bounds analysis of pointers, array indices, and accessed memory regions , 2005, TOPL.
[13] Steven S. Lumetta,et al. CUBA: an architecture for efficient CPU/co-processor data communication , 2008, ICS '08.
[14] David I. August,et al. Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.
[15] Sudhakar Yalamanchili,et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[16] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.
[17] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[18] Andrew Richards,et al. Automatic Offloading of C++ for the Cell BE Processor: A Case Study Using Offload , 2010, 2010 International Conference on Complex, Intelligent and Software Intensive Systems.