Performance Engineering of Numerical Software on Multi- and Manycore Processors
暂无分享,去创建一个
[1] Volker Strumpen,et al. The memory behavior of cache oblivious stencil computations , 2007, The Journal of Supercomputing.
[2] Matthias Christen,et al. Generating and auto-tuning parallel stencil codes , 2011 .
[3] D. Brandt,et al. Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .
[4] John von Neumann,et al. First draft of a report on the EDVAC , 1993, IEEE Annals of the History of Computing.
[5] Ulrich Rüde,et al. Fluid flow simulation on the Cell Broadband Engine using the lattice Boltzmann method , 2009, Comput. Math. Appl..
[6] Shmuel Peleg,et al. Seamless Image Stitching in the Gradient Domain , 2004, ECCV.
[7] Markus Kowarschik,et al. Data locality optimizations for iterative numerical algorithms and cellular automata on hierarchical memory architectures , 2004, Advances in simulation.
[8] William Jalby,et al. Hardware Performance Monitoring for the Rest of Us: A Position and Survey , 2011, NPC.
[9] Fan Yang,et al. Super-Resolution from One Single Low-Resolution Image Based on R-KSVD and Example-Based Algorithm , 2013, IDEAL.
[10] Michael Elad,et al. Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit , 2008 .
[11] Christian Weiß,et al. Data locality optimizations for multigrid methods on structured grids , 2001 .
[12] J. C. Jaeger,et al. Conduction of Heat in Solids , 1952 .
[13] Marcus Mohr,et al. Cell-centred multigrid revisited , 2004 .
[14] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.
[15] Michael Elad,et al. Submitted to Ieee Transactions on Image Processing Image Decomposition via the Combination of Sparse Representations and a Variational Approach , 2022 .
[16] David G. Wonnacott,et al. Time Skewing for Parallel Computers , 1999, LCPC.
[17] Jack Dongarra,et al. SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3 , 2007 .
[18] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[19] Josef Weidendorfer,et al. Off-loading application controlled data prefetching in numerical codes for multi-core processors , 2008, Int. J. Comput. Sci. Eng..
[20] Harald Köstler,et al. An Orthogonal Matching Pursuit Algorithm for Image Denoising on the Cell Broadband Engine , 2009, PPAM.
[21] U. Rüde,et al. Simulation of Heat-Induced Elastic Deformation of Cylindrical-Shaped Bodies , 2010 .
[22] Nancy S. Pollard,et al. Real-time gradient-domain painting , 2008, ACM Trans. Graph..
[23] Gerhard Wellein,et al. Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results , 2011, ArXiv.
[24] Nicolas Legrand,et al. Analysis of Roll Gap Heat Transfers in Hot Steel Strip Rolling through Roll Temperature Sensors and Heat Transfer Models , 2012 .
[25] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..
[26] T. Chan,et al. On the Convergence of the Lagged Diffusivity Fixed Point Method in Total Variation Image Restoration , 1999 .
[27] Thomas Zeiser,et al. Performance evaluation of a parallel sparse lattice Boltzmann solver , 2008, J. Comput. Phys..
[28] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[29] Juliane Junker,et al. Computer Organization And Design The Hardware Software Interface , 2016 .
[30] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[31] Ulrich Rüde,et al. A framework that supports in writing performance-optimized stencil-based codes , 2010 .
[32] H. D. Baehr,et al. Wärme- und Stoffübertragung , 1994 .
[33] J. Boon. The Lattice Boltzmann Equation for Fluid Dynamics and Beyond , 2003 .
[34] Harald Köstler,et al. Real-time simulation of temperature in hot rolling rolls , 2014, J. Comput. Sci..
[35] Ulrich Rüde,et al. A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters , 2010, Parallel Comput..
[36] Gerhard Wellein,et al. Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.
[37] Harald Köstler,et al. Performance engineering to achieve real-time high dynamic range imaging , 2012, Journal of Real-Time Image Processing.
[38] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[39] A. Bruckstein,et al. K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .
[40] Gerhard Wellein,et al. Exploring performance and power properties of modern multi‐core chips via simple machine models , 2012, Concurr. Comput. Pract. Exp..
[41] Matthew Scarpino,et al. Programming the Cell Processor: For Games, Graphics, and Computation , 2008 .
[42] Gerhard Wellein,et al. Towards Optimal Performance for Lattice Boltzmann Applications on Terascale Computers , 2006 .
[43] Ulrich Rüde,et al. Fixed and Adaptive Cache Aware Algorithms for Multigrid Methods , 2000 .
[44] Achi Brandt,et al. Vectorized multigrid poisson solver for the CDC cyber 205 , 1983 .
[45] Dietmar Fey,et al. High Performance Stencil Code Algorithms for GPGPUs , 2011, ICCS.
[46] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[47] Jan Treibig,et al. Efficiency improvements of iterative numerical algorithms on modern architectures , 2008 .
[48] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[49] Wolfgang Hackbusch,et al. Multi-grid methods and applications , 1985, Springer series in computational mathematics.
[50] Gerhard Wellein,et al. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.
[51] Wolfgang Joppich,et al. Practical Fourier Analysis for Multigrid Methods , 2004 .
[52] Ibm Redbooks,et al. Programming the Cell Broadband Engine Architecture: Examples and Best Practices , 2008 .
[53] Gerhard Wellein,et al. LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[54] Ken Kennedy,et al. Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..
[55] Ulrich Rüde,et al. Modeling Multigrid Algorithms for Variational Imaging , 2010, 2010 21st Australian Software Engineering Conference.
[56] Jason N. Dale,et al. Cell Broadband Engine Architecture and its first implementation - A performance view , 2007, IBM J. Res. Dev..
[57] Thomas R. Braun,et al. An evaluation of GPU acceleration for sparse reconstruction , 2010, Defense + Commercial Sensing.
[58] Georg Hager,et al. Introducing a Performance Model for Bandwidth-Limited Loop Kernels , 2009, PPAM.
[59] Robert Strzodka,et al. Using GPUs to improve multigrid solver performance on a cluster , 2008, Int. J. Comput. Sci. Eng..
[60] Yehoshua Y. Zeevi,et al. Image enhancement and denoising by complex diffusion processes , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[61] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[62] Adolfy Hoisie,et al. Performance Optimization of Numerically Intensive Codes , 1987 .
[63] P. Wesseling. An Introduction to Multigrid Methods , 1992 .
[64] Diomidis Spinellis,et al. Code Quality: The Open Source Perspective , 2006 .
[65] Nils Thürey,et al. Physically based animation of free surface flows with the Lattice Boltzmann method , 2007 .
[66] Ulrich Rüde,et al. Fast Wavelet Transform Utilizing a Multicore-Aware Framework , 2010, PARA.
[67] Gerhard Wellein,et al. Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering , 2012, Euro-Par Workshops.