Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax–Wendroff correction stencil
暂无分享,去创建一个
Shuaiwen Song | Guangwen Yang | Maryam Mehri Dehnavi | Yang You | Lin Gan | Haohuan Fu | Xiaomeng Huang
[1] Michael Klemm,et al. Extending a Highly Parallel Data Mining Algorithm to the Intel ® Many Integrated Core Architecture , 2011, Euro-Par Workshops.
[2] Christian Terboven,et al. OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.
[3] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.
[4] Ingo Wald,et al. Combining Single and Packet-Ray Tracing for Arbitrary Ray Distributions on the Intel MIC Architecture , 2012, IEEE Transactions on Visualization and Computer Graphics.
[5] P. Lax,et al. Difference schemes for hyperbolic equations with high order of accuracy , 1964 .
[6] Giorgio Valle,et al. CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.
[7] T. Okamoto,et al. Accelerating large-scale simulation of seismic wave propagation by multi-GPUs and three-dimensional domain decomposition , 2010 .
[8] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.
[9] Mario Cannataro,et al. Euro-Par 2011: Parallel Processing Workshops , 2011, Lecture Notes in Computer Science.
[10] Dimitri Komatitsch,et al. Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards , 2010 .
[11] Vladimir Surkov. Parallel option pricing with Fourier Space Time-stepping method on Graphics Processing Units , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[12] Zhiyuan Li,et al. Automatic tiling of iterative stencil loops , 2004, TOPL.
[13] Victor W. Lee,et al. Fast Sort on CPUs , GPUs and Intel MIC Architectures , 2010 .
[14] Trevor N. Mudge,et al. Power: A First-Class Architectural Design Constraint , 2001, Computer.
[15] M. Balakrishnan. Power Consumption in Multi-core Processors , 2012, IC3.
[16] Haohuan Fu,et al. Selecting the right hardware for reverse time migration , 2010 .
[17] Liu Guo-feng. GPU/CPU co-processing parallel computation for seismic data processing in oil and gas exploration , 2009 .
[18] Johan O. A. Robertsson,et al. A modified Lax-Wendroff correction for wave propagation in media described by Zener elements , 1997 .
[19] Alejandro Duran,et al. The Intel® Many Integrated Core Architecture , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).
[20] Jack J. Dongarra,et al. A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..
[21] William W. Symes,et al. Dispersion analysis of numerical wave propagation and its computational consequences , 1995 .
[22] Frank Mueller,et al. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters , 2012, CGO '12.
[23] Tsutomu Maruyama,et al. Performance comparison of FPGA, GPU and CPU in image processing , 2009, 2009 International Conference on Field Programmable Logic and Applications.
[24] M. A. Dablain,et al. The application of high-order differencing to the scalar wave equation , 1986 .
[25] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[26] F. Al-Shamali,et al. Author Biographies. , 2015, Journal of social work in disability & rehabilitation.
[27] Frank Mueller,et al. Autogeneration and Autotuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters , 2013, IEEE Transactions on Parallel and Distributed Systems.
[28] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.
[29] Pradeep Dubey,et al. Can traditional programming bridge the Ninja performance gap for parallel computing applications , 2012, ISCA 2012.
[30] Kevin Skadron,et al. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.