From Loop Fusion to Kernel Fusion: A Domain-Specific Approach to Locality Optimization
暂无分享,去创建一个
Jürgen Teich | Frank Hannig | Bo Qiao | Oliver Reiche | J. Teich | Bo Qiao | Frank Hannig | Oliver Reiche
[1] Dorit S. Hochbaum,et al. A Polynomial Algorithm for the k-cut Problem for Fixed k , 1994, Math. Oper. Res..
[2] Wei Yi,et al. Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.
[3] H. Jensen. Night Rendering , 2000 .
[4] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.
[5] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[6] Giovanni Ramponi,et al. A cubic unsharp masking technique for contrast enhancement , 1998, Signal Process..
[7] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[8] Mechthild Stoer,et al. A simple min-cut algorithm , 1997, JACM.
[9] Sudhakar Yalamanchili,et al. Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[10] Shirish Tatikonda,et al. On optimizing machine learning workloads via kernel fusion , 2015, PPoPP.
[11] Jürgen Teich,et al. Automatic Kernel Fusion for Image Processing DSLs , 2018, SCOPES.
[12] V. Sarkar,et al. Collective Loop Fusion for Array Contraction , 1992, LCPC.
[13] Uday Bondhugula,et al. PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.
[14] Ludek Matyska,et al. Optimizing CUDA code by kernel fusion: application on BLAS , 2013, The Journal of Supercomputing.
[15] Carlo Tomasi,et al. Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.
[16] Jürgen Teich,et al. FPGA-based accelerator design from a domain-specific language , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[17] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[18] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.
[19] Kathryn S. McKinley,et al. A Parametrized Loop Fusion Algorithm for Improving Parallelism and Cache Locality , 1997, Comput. J..
[20] Fawnizu Azmadi Hussin,et al. Image Enhancement Using Geometric Mean Filter and Gamma Correction for WCE Images , 2014, ICONIP.
[21] Jonathan Ragan-Kelley,et al. Automatically scheduling halide image processing pipelines , 2016, ACM Trans. Graph..
[22] Jürgen Teich,et al. HIPAcc: A Domain-Specific Language and Compiler for Image Processing , 2016, IEEE Transactions on Parallel and Distributed Systems.
[23] Antje Baer,et al. Handbook Of Medical Image Processing And Analysis , 2016 .
[24] Mark J. Shensa,et al. The discrete wavelet transform: wedding the a trous and Mallat algorithms , 1992, IEEE Trans. Signal Process..