ELMO: A User-Friendly API to Enable Local Memory in OpenCL Kernels
暂无分享,去创建一个
[1] Gauthier Lafruit,et al. Cross-Based Local Stereo Matching Using Orthogonal Integral Images , 2009, IEEE Transactions on Circuits and Systems for Video Technology.
[2] Brucek Khailany,et al. CudaDMA: Optimizing GPU memory bandwidth via warp specialization , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[3] Mahmut T. Kandemir,et al. Compiler-directed scratch pad memory hierarchy design and management , 2002, DAC '02.
[4] William J. Dally,et al. A tuning framework for software-managed memory hierarchies , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[5] Richard Szeliski,et al. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.
[6] Yao Zhang,et al. Scan primitives for GPU computing , 2007, GH '07.
[7] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[8] Uday Bondhugula,et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories , 2008, PPoPP.
[9] Majid Sarrafzadeh,et al. A memory optimization technique for software-managed scratchpad memory in GPUs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.
[10] D. Scharstein,et al. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).
[11] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[12] Kunle Olukotun,et al. A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.