Unveiling kernel concurrency in multiresolution filters on GPUs with an image processing DSL
暂无分享,去创建一个
Jürgen Teich | Frank Hannig | Oliver Reiche | Bo Qiao | J. Teich | Bo Qiao | Frank Hannig | Oliver Reiche
[1] Frédo Durand,et al. Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..
[2] Jürgen Teich,et al. From Loop Fusion to Kernel Fusion: A Domain-Specific Approach to Locality Optimization , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[3] Yun Liang,et al. Efficient GPU Spatial-Temporal Multitasking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[4] Torsten Hoefler,et al. Absinthe: Learning an Analytical Performance Model to Fuse and Tile Stencil Codes in One Shot , 2019, 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[5] Won Woo Ro,et al. Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[6] Fue-Sang Lien,et al. Parallel Adaptive Mesh Refinement Combined with Additive Multigrid for the Efficient Solution of the Poisson Equation , 2012 .
[7] Hao Li,et al. Performance modeling in CUDA streams — A means for high-throughput data processing , 2014, 2014 IEEE International Conference on Big Data (Big Data).
[8] Jürgen Teich,et al. HIPAcc: A Domain-Specific Language and Compiler for Image Processing , 2016, IEEE Transactions on Parallel and Distributed Systems.
[9] Jan Modersitzki,et al. FAIR: Flexible Algorithms for Image Registration , 2009 .
[10] Uday Bondhugula,et al. PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.
[11] Edward H. Adelson,et al. A multiresolution spline with application to image mosaics , 1983, TOGS.
[12] Robert J. Harrison,et al. A Domain-Specific Compiler for a Parallel Multiresolution Adaptive Numerical Simulation Environment , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Michael Unser,et al. Multiresolution image registration procedure using spline pyramids , 1993, Optics & Photonics.
[14] Ming Yang,et al. Inferring the Scheduling Policies of an Embedded CUDA GPU , 2017 .
[15] Jan Kautz,et al. Local Laplacian filters , 2015, Commun. ACM.
[16] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[17] Edward H. Adelson,et al. The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..
[18] Rami G. Melhem,et al. Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[19] Huiyang Zhou,et al. Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution , 2019, ACM Trans. Archit. Code Optim..
[20] Ming Zhang,et al. Multiresolution Bilateral Filtering for Image Denoising , 2008, IEEE Transactions on Image Processing.
[21] Jürgen Teich,et al. Towards a performance-portable description of geometric multigrid algorithms using a domain-specific language , 2014, J. Parallel Distributed Comput..
[22] Til Aach,et al. Nonlinear multiresolution gradient adaptive filter for medical images , 2003, SPIE Medical Imaging.
[23] Jan Kautz,et al. Local Laplacian filters: edge-aware image processing with a Laplacian pyramid , 2011, ACM Trans. Graph..
[24] Scott A. Mahlke,et al. Dynamic Resource Management for Efficient Utilization of Multitasking GPUs , 2017, ASPLOS.
[25] Roberto Manduchi,et al. Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).
[26] Jürgen Teich,et al. Automatic Kernel Fusion for Image Processing DSLs , 2018, SCOPES.