An OpenACC Optimizer for Accelerating Histogram Computation on a GPU
暂无分享,去创建一个
[1] Amirali Baniasadi,et al. Employing Software-Managed Caches in OpenACC , 2016, ACM Trans. Model. Perform. Evaluation Comput. Syst..
[2] Rudolf Eigenmann,et al. OpenMPC: extended OpenMP for efficient programming and tuning on GPUs , 2013, Int. J. Comput. Sci. Eng..
[3] Fumihiko Ino,et al. Sequence Homology Search Using Fine Grained Cycle Sharing of Idle GPUs , 2012, IEEE Transactions on Parallel and Distributed Systems.
[4] Rodney A. Kennedy,et al. Efficient Histogram Algorithms for NVIDIA CUDA Compatible Devices , 2007 .
[5] Mitsuhisa Sato,et al. XcalableACC: Extension of XcalableMP PGAS Language Using OpenACC for Accelerator Clusters , 2014, 2014 First Workshop on Accelerator Programming using Directives.
[6] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[7] Satoshi Matsuoka,et al. An OpenACC Extension for Data Layout Transformation , 2014, 2014 First Workshop on Accelerator Programming using Directives.
[8] Henk Corporaal,et al. High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs , 2011, GPGPU-4.
[9] Toru Fujiwara,et al. Enumerating Joint Weight of a Binary Linear Code Using Parallel Architectures: multi-core CPUs and GPUs , 2015, Int. J. Netw. Comput..
[10] Richard W. Vuduc,et al. Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization , 2009, LCPC.
[11] John D. Owens,et al. GPU Computing , 2008, Proceedings of the IEEE.
[12] Fumihiko Ino,et al. Efficient Acceleration of Mutual Information Computation for Nonrigid Registration Using CUDA , 2014, IEEE Journal of Biomedical and Health Informatics.
[13] Fumihiko Ino,et al. Accelerating the Smith-Waterman algorithm with interpair pruning and band optimization for the all-pairs comparison of base sequences , 2015, BMC Bioinformatics.
[14] Mitsuhisa Sato,et al. Productivity and Performance of Global-View Programming with XcalableMP PGAS Language , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).
[15] P.V.C. Hough,et al. Machine Analysis of Bubble Chamber Pictures , 1959 .
[16] Eddy Z. Zhang,et al. Massive atomics for massive parallelism on GPUs , 2014, ISMM '14.
[17] Anuj Agarwal,et al. Analysis of sleep traits in knockout mice from the large-scale KOMP2 population using a non-invasive, high-throughput piezoelectric system , 2015, BMC Bioinformatics.
[18] Alex Ramírez,et al. Parallelizing general histogram application for CUDA architectures , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).
[19] Fumihiko Ino,et al. High-performance cone beam reconstruction using CUDA compatible GPUs , 2010, Parallel Comput..
[20] Fumihiko Ino,et al. PACC : An Extension of OpenACC for Pipelined Processing of Large Data on a GPU , 2014 .