Accelerating Atmospheric Modeling Through Emerging Multi-core Technologies
暂无分享,去创建一个
[1] R. Turco,et al. SMVGEAR: A sparse-matrix, vectorized gear code for atmospheric models , 1994 .
[2] Edward T. Grochowski,et al. Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[3] E. Hairer,et al. Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems , 2010 .
[4] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[5] Kue-Hwan Sihn,et al. Analysis and Parallelization of H.264 decoder on Cell Broadband Engine Architecture , 2007, 2007 IEEE International Symposium on Signal Processing and Information Technology.
[6] David F. Heidel,et al. An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[7] J. Verwer,et al. Analysis of operator splitting for advection-diffusion-reaction problems from air pollution modelling , 1999 .
[8] William C. Skamarock,et al. A time-split nonhydrostatic atmospheric model for weather research and forecasting applications , 2008, J. Comput. Phys..
[9] Adrian Sandu,et al. A communication library for the parallelization of air quality models on structured grids , 2002 .
[10] Himanshu Rawat,et al. Implementation of Spatial Domain Filters for Cell Broadband Engine , 2008, 2008 First International Conference on Emerging Trends in Engineering and Technology.
[11] Jordan G. Powers,et al. A Description of the Advanced Research WRF Version 2 , 2005 .
[12] Sam S. Stone,et al. MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores , 2011 .
[13] Adrian Sandu,et al. Scalable heterogeneous parallelism for atmospheric modeling and simulation , 2010, The Journal of Supercomputing.
[15] Eitan Grinspun,et al. Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.
[16] Jerome D. Fast,et al. Model for Simulating Aerosol Interactions and Chemistry (MOSAIC) , 2008 .
[17] Nicholas J. Wright,et al. WRF nature run , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[18] Guang R. Gao,et al. Optimizing the Fast Fourier Transform on a Multi-core Architecture , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[19] Florian A. Potra,et al. The kinetic preprocessor KPP*/a software environment for solving chemical kinetics , 2002 .
[20] Manish Vachharajani,et al. Deconstructing Hardware Usage for General Purpose Computation on GPUs , 2006 .
[21] Haitao Wei,et al. Loading OpenMP to Cell: An Effective Compiler Framework for Heterogeneous Multi-core Chip , 2007, IWOMP.
[22] L. K. Peters,et al. A second generation model for regional-scale transport/chemistry/deposition , 1986 .
[23] James Demmel,et al. LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs , 2008 .
[24] Tao Zhang,et al. Supporting OpenMP on Cell , 2008, International Journal of Parallel Programming.
[25] FengWu-chun,et al. The Green500 List , 2007 .
[26] Jack J. Dongarra,et al. Implementation of mixed precision in solving systems of linear equations on the Cell processor , 2007, Concurr. Comput. Pract. Exp..
[27] Timothy Mark Pinkston,et al. On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus , 2007, First International Symposium on Networks-on-Chip (NOCS'07).
[28] Sadaf R. Alam,et al. On the Path to Enable Multi-scale Biomolecular Simulations on PetaFLOPS Supercomputer with Multi-core Processors , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[29] P. Hanrahan,et al. Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[30] Jung Ho Ahn,et al. Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[31] Qing Wang,et al. Speech Codec Optimization Based on Cell Broadband Engine , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[32] H. H. Rosenbrock,et al. Some general implicit processes for the numerical solution of differential equations , 1963, Comput. J..
[33] Murali Krishna,et al. Feasibility study of MPI implementation on the heterogeneous multi-core cell BE™ architecture , 2007, SPAA '07.
[34] Adrian Sandu,et al. Implementation and evaluation of an array of chemical solvers in the Global Chemical Transport Model GEOS-Chem , 2009 .
[35] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.
[36] Alexandros Stamatakis,et al. Dynamic multigrain parallelization on the cell broadband engine , 2007, PPoPP.
[37] Murali Krishna,et al. A Buffered-Mode MPI Implementation for the Cell BETM Processor , 2007, International Conference on Computational Science.
[38] Gargi Dasgupta,et al. Transparent grid enablement of weather research and forecasting , 2008, Mardi Gras Conference.
[39] D. Jacob,et al. Global modeling of tropospheric chemistry with assimilated meteorology : Model description and evaluation , 2001 .
[40] B. Sportisse. An Analysis of Operator Splitting Techniques in the Stiff Case , 2000 .
[41] Greg Burns,et al. LAM: An Open Cluster Environment for MPI , 2002 .
[42] J. Brandts. [Review of: W. Hundsdorfer, J.G. Verwer (2003) Numerical Solution of Time-Dependent Advection-Diffusion-Reaction Equations] , 2006 .
[43] Willem Hundsdorfer,et al. RKC time-stepping for advection-diffusion-reaction problems , 2004 .
[44] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[45] John Christian Linford,et al. Detecting Load Imbalance in Massively Parallel Applications Internship Report , 2009 .
[46] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[47] Lisa Schweitzer,et al. The Sustainable Mobility Learning Laboratory: Interactive Web-Based Education on Transportation and the Environment , 2008 .
[48] R. Jackson,et al. General mass action kinetics , 1972 .
[49] Samuel Williams,et al. The potential of the cell processor for scientific computing , 2005, CF '06.
[50] J. Verwer. Explicit Runge-Kutta methods for parabolic partial differential equations , 1996 .
[51] Paulette Middleton,et al. Aggregation and analysis of volatile organic compound emissions for regional modeling , 1990 .
[52] Felix Wolf,et al. Replay-Based Synchronization of Timestamps in Event Traces of Massively Parallel Applications , 2008, 2008 International Conference on Parallel Processing - Workshops.
[53] William J. Dally,et al. The Imagine Stream Processor , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.
[54] J. Lambert. Numerical Methods for Ordinary Differential Systems: The Initial Value Problem , 1991 .
[55] Anwar Ghuloum. Future Proof Data Parallel Algorithms and Software on Intel Multicore Architecture , 2007 .
[56] Julien Langou,et al. Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems , 2007, Int. J. High Perform. Comput. Appl..
[57] Pawel Gepner,et al. Second Generation Quad-Core Intel Xeon Processors Bring 45 nm Technology and a New Level of Performance to HPC Applications , 2008, ICCS.
[58] Philip S. Yu,et al. SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.
[59] Kwoh Chee Keong,et al. Applications of Heterogeneous Structure of Cell Broadband Engine Architecture for Biological Database Similarity Search , 2008, 2008 2nd International Conference on Bioinformatics and Biomedical Engineering.
[60] Tuning and Analysis Utilities , 2011, Encyclopedia of Parallel Computing.
[61] V. Natoli,et al. Exploring New Architectures in Accelerating CFD for Air Force Applications , 2008, 2008 DoD HPCMP Users Group Conference.
[62] Rüdiger Westermann,et al. Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.
[63] Kalyan S. Perumalla. Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs) , 2006, 20th Workshop on Principles of Advanced and Distributed Simulation (PADS'06).
[64] Bo Li,et al. Optimized Implementation of Ray Tracing on Cell Broadband Engine , 2008, 2008 International Conference on Multimedia and Ubiquitous Engineering (mue 2008).
[65] Charles Hirsch,et al. Numerical computation of internal and external flows (vol1: Fundamentals of numerical discretization) , 1991 .
[66] Dinesh Manocha,et al. LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[67] Arie E. Kaufman,et al. GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[68] Manish Vachharajani,et al. GPU acceleration of numerical weather prediction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[69] Felix Wolf,et al. Scalable timestamp synchronization for event traces of message-passing applications , 2009, Parallel Comput..
[70] Guang R. Gao,et al. Software-Pipelining on Multi-Core Architectures , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[71] Linda R. Petzold,et al. Runge-Kutta-Chebyshev projection method , 2006, J. Comput. Phys..
[72] Assyr Abdulle,et al. Second order Chebyshev methods based on orthogonal polynomials , 2001, Numerische Mathematik.
[73] B. Flachs,et al. The microarchitecture of the synergistic processor for a cell processor , 2006, IEEE Journal of Solid-State Circuits.
[74] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[75] Adrian Sandu,et al. Adjoint sensitivity analysis of regional air quality models , 2005 .
[76] Dimitrios S. Nikolopoulos,et al. Dma-based prefetching for i/o-intensive workloads on the cell architecture , 2008, CF '08.
[77] William J. Dally,et al. Executing irregular scientific applications on stream architectures , 2007, ICS '07.
[78] Jason N. Dale,et al. Cell Broadband Engine Architecture and its first implementation - A performance view , 2007, IBM J. Res. Dev..
[79] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.
[80] John L. Klepeis,et al. Anton, a special-purpose machine for molecular dynamics simulation , 2007, ISCA '07.
[81] Guang R. Gao,et al. Programming Experience on Cyclops-64 Multi-Core Chip Architecture , 2022 .
[82] Adrian Sandu,et al. Improved Quasi-Steady-State-Approximation Methods for Atmospheric Chemistry Integration , 1997, SIAM J. Sci. Comput..
[83] David A. Bader,et al. High performance MPEG-2 software decoder on the cell broadband engine , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[84] David Gregg,et al. Streamlining Offload Computing to High Performance Architectures , 2009, ICCS.
[85] Adrian Sandu,et al. Optimizing large scale chemical transport models for multicore platforms , 2008, SpringSim '08.
[86] Willem Hundsdorfer,et al. Numerical Solution of Advection-Diffusion-Reaction Equations , 1996 .
[87] Keith A. Duke,et al. A Professional Graphics Controller , 1985, IBM Syst. J..
[88] Adrian Sandu,et al. Vector stream processing for effective application of heterogeneous parallelism , 2009, SAC '09.
[89] Michael Lang,et al. Entering the petaflop era: The architecture and performance of Roadrunner , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[90] F. Kirchner,et al. A new mechanism for regional atmospheric chemistry modeling , 1997 .
[91] M. C. Dodge,et al. A photochemical kinetics mechanism for urban and regional scale computer modeling , 1989 .
[92] Georg A. Grell,et al. Fully coupled “online” chemistry within the WRF model , 2005 .
[93] Khaled Z. Ibrahim,et al. Implementing Wilson-Dirac operator on the cell broadband engine , 2008, ICS '08.
[94] G. Strang. On the Construction and Comparison of Difference Schemes , 1968 .
[95] W. Stockwell,et al. The second generation regional acid deposition model chemical mechanism for regional air quality modeling , 1990 .
[96] Benjamin Rose,et al. A comparison of programming models for multiprocessors with explicitly managed memory hierarchies , 2009, PPoPP '09.
[97] William P. L. Carter. A DETAILED MECHANISM FOR THE GAS-PHASE ATMOSPHERIC REACTIONS OF ORGANIC COMPOUNDS , 1990 .
[98] Hidemasa Muta,et al. Multilevel parallelization on the cell/B.E. for a motion JPEG 2000 encoding server , 2007, ACM Multimedia.
[99] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[100] J. Verwer,et al. Numerical solution of time-dependent advection-diffusion-reaction equations , 2003 .
[101] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[102] Adrian Sandu,et al. Performance of stabilized explicit time integration methods for parallel air quality models , 2007, SpringSim '07.
[103] Victor Eijkhout,et al. Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.
[104] D. Byun. Science algorithms of the EPA Models-3 community multi-scale air quality (CMAQ) modeling system , 1999 .
[105] I. Wald,et al. Ray Tracing on the Cell Processor , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.
[106] H. Najm,et al. High-order spatial discretizations and extended stability methods for reacting flows on structured adaptively refined meshes , 2022 .
[107] Renjian Zhang,et al. Evaluation of the Models-3 Community Multi-scale Air Quality (CMAQ) modeling system with observations obtained during the TRACE-P experiment: Comparison of ozone and its related species , 2006 .