ECP Software Technology Capability Assessment Report
暂无分享,去创建一个
Rajeev Thakur | Jonathan Carter | Lois Curfman McInnes | Michael A. Heroux | James Ahrens | J. R. Neely | Jeffrey S. Vetter | Exascale Computing | J. Robert Neely | R. Thakur | J. Vetter | M. Heroux | L. McInnes | J. Ahrens | Exascale Computing | Jonathan Carter
[1] Dmitri Kuzmin,et al. Sequential limiting in continuous and discontinuous Galerkin methods for the Euler equations , 2018, J. Comput. Phys..
[2] V. E. Henson,et al. BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .
[3] Kenneth Moreland,et al. Visualization for Exascale: Portable Performance is Critical , 2015, Supercomput. Front. Innov..
[4] Kwan-Liu Ma,et al. Flexible Analysis Software for Emerging Architectures , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[5] Martin Schulz,et al. Thread-local concurrency: a technique to handle data race detection at programming model abstraction , 2018, HPDC.
[6] Rajeev Thakur,et al. Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming , 2010, Int. J. High Perform. Comput. Appl..
[7] William Gropp,et al. Fault Tolerance in Message Passing Interface Programs , 2004, Int. J. High Perform. Comput. Appl..
[8] Stephen L. Olivier,et al. OpenMPIR: Implementing OpenMP Tasks with Tapir , 2017, LLVM-HPC@SC.
[9] Corporate The MPI Forum. MPI: a message passing interface , 1993, Supercomputing '93.
[10] Reid Priedhorsky,et al. Charliecloud: Unprivileged Containers for User-Defined Software Stacks in HPC , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] Hubert Ritzdorf,et al. The scalable process topology interface of MPI 2.2 , 2011, Concurr. Comput. Pract. Exp..
[12] Peng Li,et al. Combining events and threads for scalable network services implementation and evaluation of monadic, application-level concurrency primitives , 2007, PLDI '07.
[13] Patrick S. McCormick,et al. Accommodating Thread-Level Heterogeneity in Coupled Parallel Applications , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[14] Jonathan Green,et al. Multi-core and Network Aware MPI Topology Functions , 2011, EuroMPI.
[15] Guy E. Blelloch,et al. Vector Models for Data-Parallel Computing , 1990 .
[16] Jeremy S. Meredith,et al. Parallel in situ coupling of simulation with a fully featured visualization system , 2011, EGPGV '11.
[17] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[18] Torsten Hoefler,et al. Mpi on Millions of Cores * , 2022 .
[19] Martin Schulz,et al. Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management Framework , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[20] F. Cappello,et al. Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[21] Jeffrey Cornelis,et al. The Communication-Hiding Conjugate Gradient Method with Deep Pipelines , 2018, ArXiv.
[22] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[23] Akinori Yonezawa,et al. StackThreads/MP: integrating futures into calling standards , 1999, PPoPP '99.
[24] Kevin T. Pedretti,et al. A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds , 2017, 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).
[25] Michael Bauer,et al. S3D-Legion : An Exascale Software for Direct Numerical Simulation of Turbulent Combustion with Complex Multicomponent Chemistry , 2017 .
[26] Gregory Becker,et al. Managing Combinatorial Software Installations with Spack , 2016, 2016 Third International Workshop on HPC User Support Tools (HUST).
[27] Vivek Sarkar,et al. Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling , 2018, CC.
[28] Tamara G. Kolda,et al. Parallel Tensor Compression for Large-Scale Scientific Data , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[29] Abhinav Vishnu,et al. On the suitability of MPI as a PGAS runtime , 2014, 2014 21st International Conference on High Performance Computing (HiPC).
[30] Ada Gavrilovska,et al. CoMerge: toward efficient data placement in shared heterogeneous memory systems , 2017, MEMSYS.
[31] Tjerk P. Straatsma,et al. NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..
[32] Tzanio V. Kolev,et al. Multi‐material closure model for high‐order finite element Lagrangian hydrodynamics , 2016 .
[33] Bronis R. de Supinski,et al. The Spack package manager: bringing order to HPC software chaos , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[34] Martin Schulz,et al. A Unified Platform for Exploring Power Management Strategies , 2016, 2016 4th International Workshop on Energy Efficient Supercomputing (E2SC).
[35] William Gropp,et al. PETSc Users Manual Revision 3.4 , 2016 .
[36] Pavan Balaji,et al. Process-Based Asynchronous Progress Model for MPI Point-to-Point Communication , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[37] Veselin Dobrev,et al. Curvilinear finite elements for Lagrangian hydrodynamics , 2011 .
[38] Hank Childs,et al. Ray tracing within a data parallel framework , 2015, 2015 IEEE Pacific Visualization Symposium (PacificVis).
[39] Anders Clausen,et al. Supercomputing Centers and Electricity Service Providers: A Geographically Distributed Perspective on Demand Management in Europe and the United States , 2016, ISC.
[40] Guang R. Gao,et al. TiNy threads: a thread virtual machine for the Cyclops64 cellular architecture , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[41] Douglas Thain,et al. Qthreads: An API for programming with millions of lightweight threads , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[42] Lois C. McInnes,et al. xSDK Foundations: Toward an Extreme-scale Scientific Software Development Kit , 2017, Supercomput. Front. Innov..
[43] Michael W. Mahoney Boyd,et al. Randomized Algorithms for Matrices and Data , 2010 .
[44] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[45] Tzanio V. Kolev,et al. High order curvilinear finite elements for elastic-plastic Lagrangian dynamics , 2014, J. Comput. Phys..
[46] Anders Logg,et al. DOLFIN: Automated finite element computing , 2010, TOMS.
[47] Leonid Oliker,et al. Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[48] Rajeev Thakur,et al. Test suite for evaluating performance of multithreaded MPI communication , 2009, Parallel Comput..
[49] Kwan-Liu Ma,et al. VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures , 2016, IEEE Computer Graphics and Applications.
[50] Alexander Aiken,et al. Realm: An event-based low-level runtime for distributed memory architectures , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[51] S. Ashby,et al. A parallel multigrid preconditioned conjugate gradient algorithm for groundwater flow simulations , 1996 .
[52] Smith Barry,et al. xSDK Community Installation Policies: GNU Autoconf and CMake Options , 2016 .
[53] Jack J. Dongarra,et al. Improving the Performance of CA-GMRES on Multicores with Multiple GPUs , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[54] Prabhat,et al. Extreme Scaling of Production Visualization Software on Diverse Architectures , 2010, IEEE Computer Graphics and Applications.
[55] Jungwon Kim,et al. PapyrusKV: A High-Performance Parallel Key-Value Store for Distributed NVM Architectures , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[56] Manuel Quezada de Luna,et al. High-order local maximum principle preserving (MPP) discontinuous Galerkin finite element method for the transport equation , 2017, J. Comput. Phys..
[57] Utkarsh Ayachit,et al. The ParaView Guide: A Parallel Visualization Application , 2015 .
[58] Kwan-Liu Ma,et al. Finely-Threaded History-Based Topology Computation , 2014, EGPGV@EuroVis.
[59] James P. Ahrens,et al. PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators , 2012, EGPGV@Eurographics.
[60] Jack Dongarra,et al. Special Issue on Program Generation, Optimization, and Platform Adaptation , 2005, Proc. IEEE.
[61] Dan Bonachea. GASNet Specification, v1.1 , 2002 .
[62] Pavan Balaji,et al. Memory Compression Techniques for Network Address Management in MPI , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[63] Robert J. Fowler,et al. Multi-threaded library for many-core systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[64] Kamil Iskra,et al. Exploring Data Migration for Future Deep-Memory Many-Core Systems , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).
[65] Mathias Jacquelin,et al. Highly scalable distributed-memory sparse triangular solution algorithms , 2018, CSC.
[66] Sayantan Sur,et al. Why Is MPI So Slow? Analyzing the Fundamental Limits in Implementing MPI-3.1 , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[67] Simone Atzeni,et al. SWORD: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production Runs , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[68] Martin Schulz,et al. Systemwide Power Management with Argo , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[69] Alex Brooks,et al. Argobots: A Lightweight Low-Level Threading and Tasking Framework , 2018, IEEE Transactions on Parallel and Distributed Systems.
[70] Laxmikant V. Kalé,et al. Threads for Interoperable Parallel Programming , 1996, LCPC.
[71] Jack J. Dongarra,et al. Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).
[72] Kamil Iskra,et al. In Situ Workflows at Exascale: System Software to the Rescue , 2017, ISAV@SC.
[73] Seda Ogrenci Memik,et al. Minimizing Thermal Variation Across System Components , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[74] Katherine A. Yelick,et al. UPC++: A PGAS Extension for C++ , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[75] Edmond Chow,et al. Iterative Sparse Triangular Solves for Preconditioning , 2015, Euro-Par.
[76] Kwan-Liu Ma,et al. A classification of scientific visualization algorithms for massive threading , 2013, UltraVis@SC.
[77] Robert Latham,et al. Portable Topology-Aware MPI-I/O , 2017, 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS).
[78] Alexander S. Szalay,et al. Extreme Event Analysis in Next Generation Simulation Architectures , 2017, ISC.
[79] Gregory Becker,et al. Using Spack to Manage Software on Cray Supercomputers , 2017 .
[80] Scott B. Baden,et al. The UPC++ PGAS library for Exascale Computing , 2017, PAW@SC.
[81] C. C. Law,et al. ParaView: An End-User Tool for Large-Data Visualization , 2005, The Visualization Handbook.
[82] Guillaume Mercier,et al. Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis , 2009, 2009 International Conference on Parallel Processing.
[83] Michael E. Papka,et al. Large-Scale Data Visualization Using Parallel Data Streaming , 2001, IEEE Computer Graphics and Applications.
[84] Akinori Yonezawa,et al. Fine-grain multithreading with minimal compiler support—a cost effective approach to implementing efficient multithreading languages , 1997, PLDI '97.
[85] Edmond Chow,et al. Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs , 2015, ISC.
[86] Marvin Theimer,et al. Cooperative Task Management Without Manual Stack Management , 2002, USENIX Annual Technical Conference, General Track.
[87] Jack Dongarra,et al. Roadmap for the Development of a Linear Algebra Library for Exascale Computing: SLATE: Software for Linear Algebra Targeting Exascale , 2017 .
[88] Michael W. Mahoney. Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..
[89] Samuel Thibault,et al. A Flexible Thread Scheduler for Hierarchical Multiprocessor Machines , 2005, ArXiv.
[90] Maya Gokhale,et al. Argo NodeOS: Toward Unified Resource Management for Exascale , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[91] Pavan Balaji,et al. A Performance Study of UCX over InfiniBand , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[92] Lisa Gerhardt,et al. Shifter: Containers for HPC , 2017 .
[93] Mark S. Gordon,et al. Chapter 41 – Advances in electronic structure theory: GAMESS a decade later , 2005 .
[94] Muneeb Ali,et al. Protothreads: simplifying event-driven programming of memory-constrained embedded systems , 2006, SenSys '06.
[95] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[96] Victor Alessandrini. Intel Threading Building Blocks , 2016 .
[97] David E. Bernholdt,et al. OpenMP 4.5 Validation and Verification Suite for Device Offload , 2018, IWOMP.
[98] Pavan Balaji,et al. Advanced Thread Synchronization for Multithreaded MPI Implementations , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[99] Pavan Balaji,et al. Hexe: A Toolkit for Heterogeneous Memory Management , 2017, 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS).
[100] Mark Rice,et al. GridPACK: A Framework for Developing Power Grid Simulations on High Performance Computing Platforms , 2014, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.
[101] Jungwon Kim,et al. Design and Implementation of Papyrus: Parallel Aggregate Persistent Storage , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[102] Song Fu,et al. F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[103] Satoshi Matsuoka,et al. MPI+Threads: runtime contention and remedies , 2015, PPOPP.
[104] Xiaoye S. Li,et al. A distributed-memory approximation algorithm for maximum weight perfect bipartite matching , 2018, ArXiv.
[105] George Bosilca,et al. Using software-based performance counters to expose low-level open MPI performance information , 2017, EuroMPI/USA.
[106] George C. Necula,et al. Capriccio: scalable threads for internet services , 2003, SOSP '03.
[107] Andrew M. Bradley,et al. Towards Performance Portability in a Compressible CFD Code , 2017 .
[108] Sunita Chandrasekaran,et al. OpenACC 2.5 Validation Testsuite Targeting Multiple Architectures , 2017, ISC Workshops.
[109] Tzanio V. Kolev,et al. High-Order Multi-Material ALE Hydrodynamics , 2018, SIAM J. Sci. Comput..
[110] Kevin T. Pedretti,et al. Characterizing MPI matching via trace-based simulation , 2017, EuroMPI/USA.
[111] Raymond Namyst,et al. MPC: A Unified Parallel Runtime for Clusters of NUMA Machines , 2008, Euro-Par.
[112] Corporate SunSoft. Solaris multithreaded programming guide , 1995 .
[113] Franck Cappello,et al. Distributed Monitoring and Management of Exascale Systems in the Argo Project , 2015, DAIS.
[114] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[115] Maya Gokhale,et al. A Container-Based Approach to OS Specialization for Exascale Computing , 2015, 2015 IEEE International Conference on Cloud Engineering.
[116] James P. Ahrens,et al. The ALPINE In Situ Infrastructure: Ascending from the Ashes of Strawman , 2017, ISAV@SC.
[117] Vanessa Sochat,et al. Singularity: Scientific containers for mobility of compute , 2017, PloS one.
[118] Jack Dongarra,et al. Designing SLATE: Software for Linear Algebra Targeting Exascale , 2017 .
[119] Bronis R. de Supinski,et al. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[120] John Shalf,et al. The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..
[121] Utkarsh Ayachit,et al. ParaView Catalyst: Enabling In Situ Data Analysis and Visualization , 2015, ISAV@SC.
[122] Pradeep Dubey,et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.
[123] Dan Bonachea,et al. GASNet-EX Performance Improvements Due to Specialization for the Cray Aries Network , 2018, 2018 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI (PAW-ATM).
[124] Wu-chun Feng,et al. MPI-ACC: Accelerator-Aware MPI for Scientific Applications , 2016, IEEE Transactions on Parallel and Distributed Systems.
[125] Tzanio V. Kolev,et al. High-Order Curvilinear Finite Element Methods for Lagrangian Hydrodynamics , 2012, SIAM J. Sci. Comput..
[126] William Schroeder,et al. The Visualization Toolkit: An Object-Oriented Approach to 3-D Graphics , 1997 .
[127] Robert Sisneros,et al. EAVL: The Extreme-scale Analysis and Visualization Library , 2012, EGPGV@Eurographics.
[128] Hal Finkel,et al. Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading , 2017, LLVM-HPC@SC.
[129] Robert D. Falgout,et al. The Design and Implementation of hypre, a Library of Parallel High Performance Preconditioners , 2006 .
[130] David M. Beazley,et al. SWIG: An Easy to Use Tool for Integrating Scripting Languages with C and C++ , 1996, Tcl/Tk Workshop.
[131] Daniel J. Rader,et al. Direct simulation Monte Carlo: The quest for speed , 2014 .
[132] Jesper Larsson Träff,et al. Exploiting Common Neighborhoods to Optimize MPI Neighborhood Collectives , 2017, 2017 IEEE 24th International Conference on High Performance Computing (HiPC).
[133] Kenneth Moreland. Oh, $#*@! Exascale! The Effect of Emerging Architectures on Scientific Discovery , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[134] Ying Wai Li,et al. QMCPACK: an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.
[135] Kenjiro Taura,et al. MassiveThreads: A Thread Library for High Productivity Languages , 2014, Concurrent Objects and Beyond.
[136] Sriram Krishnamoorthy,et al. Work stealing for GPU‐accelerated parallel programs in a global address space framework , 2016, Concurr. Comput. Pract. Exp..
[137] Jack J. Dongarra,et al. Investigating power capping toward energy‐efficient scientific applications , 2019, Concurr. Comput. Pract. Exp..
[138] Arie Shoshani,et al. Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks , 2014, Concurr. Comput. Pract. Exp..
[139] Hank Childs,et al. VisIt: An End-User Tool for Visualizing and Analyzing Very Large Data , 2011 .
[140] Ian Briggs,et al. FLiT: Cross-platform floating-point result-consistency tester and workload , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).
[141] Gokhan Memik,et al. Addressing Thermal and Performance Variability Issues in Dynamic Processors , 2017 .
[142] Ralf S. Engelschall. Portable Multithreading-The Signal Stack Trick for User-Space Thread Creation , 2000, USENIX Annual Technical Conference, General Track.
[143] Michael Lang,et al. NUMA Distance for Heterogeneous Memory , 2017, MCHPC@SC.
[144] Franck Cappello,et al. FTI: High performance Fault Tolerance Interface for hybrid systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[145] Tzanio V. Kolev,et al. High-order curvilinear finite elements for axisymmetric Lagrangian hydrodynamics , 2013 .
[146] Jack J. Dongarra,et al. Incomplete Sparse Approximate Inverses for Parallel Preconditioning , 2018, Parallel Comput..