Auto-tuning for large-scale image processing by dynamic analysis method on multicore platforms
暂无分享,去创建一个
[1] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[2] Michael Allen,et al. Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .
[3] Jeffrey Overbey,et al. ForOpenCL: transformations exploiting array syntax in Fortran for accelerator programming , 2011, Int. J. Comput. Sci. Eng..
[4] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[5] Christoph W. Kessler,et al. Automatic parallelization of simulation code for equation-based models with software pipelining and measurements on three platforms , 2009, CARN.
[6] R. Balasubramonian,et al. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[7] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[8] Frank Dehne,et al. Communication issues in scalable parallel computing , 2009 .
[9] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[10] Rafael S. Parpinelli,et al. Population-based harmony search using GPU applied to protein structure prediction , 2014, Int. J. Comput. Sci. Eng..
[11] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[12] Oge Marques,et al. Practical Image and Video Processing Using MATLAB®: Marques/Practical Image Processing , 2011 .
[13] David A. Padua,et al. Programming for parallelism and locality with hierarchically tiled arrays , 2006, PPoPP '06.
[14] Chi-Bang Kuan,et al. Automated Empirical Optimization , 2011, Encyclopedia of Parallel Computing.
[15] David R. Liu,et al. Potent Delivery of Functional Proteins into Mammalian Cells in Vitro and in Vivo Using a Supercharged Protein , 2010, ACS chemical biology.
[16] Scott E. Umbaugh,et al. Digital image processing and analysis : human and computer vision applications with CVIPtools , 2011 .
[17] Steffen Beich,et al. Digital Video And Hdtv Algorithms And Interfaces , 2016 .
[18] Abdellatif Mtibaa,et al. Temporal partitioning of data flow graphs for reconfigurable architectures , 2014, Int. J. Comput. Sci. Eng..
[19] Kiyoharu Aizawa,et al. Image Processing Technologies : Algorithms, Sensors, and Applications , 2004 .
[20] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[21] Kai Li,et al. Thread scheduling for cache locality , 1996, ASPLOS VII.
[22] Gregory G. Slabaugh,et al. Multicore Image Processing with OpenMP [Applications Corner] , 2010, IEEE Signal Processing Magazine.
[23] Frédo Durand,et al. Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..
[24] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[25] Frank Mueller,et al. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters , 2012, CGO '12.
[26] Kevin Skadron,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..
[27] Robert J. Fowler,et al. Modeling memory concurrency for multi-socket multi-core systems , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[28] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[29] Kevin Skadron,et al. A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations , 2011, International Journal of Parallel Programming.
[30] Ruay-Shiung Chang,et al. Simplifying MapReduce Data Processing , 2011, 2011 Fourth IEEE International Conference on Utility and Cloud Computing.