X10 for High-Performance Scientific Computing
暂无分享,去创建一个
[1] T. Darden,et al. Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .
[2] Arch D. Robison,et al. Structured Parallel Programming: Patterns for Efficient Computation , 2012 .
[3] Gustavo E. Scuseria,et al. A fast multipole method for periodic systems with arbitrary unit cell geometries , 1998 .
[4] Makoto Taiji,et al. 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[5] Grant S. Heffelfinger,et al. Parallel atomistic simulations , 2000 .
[6] Michael S. Warren,et al. Astrophysical N-body simulations using hierarchical tree data structures , 1992, Proceedings Supercomputing '92.
[7] Robert J. Harrison,et al. Asynchronous Programming in UPC: A Case Study and Potential for Improvement , 2009 .
[8] T. Darden,et al. A Multipole-Based Algorithm for Efficient Calculation of Forces and Potentials in Macroscopic Period , 1996 .
[9] Michael Voss,et al. Optimization via Reflection on Work Stealing in TBB , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[10] B. Chamberlain,et al. Authoring User-Defined Domain Maps in Chapel ∗ , 2011 .
[11] David E. Bernholdt,et al. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.
[12] Seth Copen Goldstein,et al. Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[13] Laxmikant V. Kalé,et al. Adaptive MPI , 2003, LCPC.
[14] Vivek Sarkar,et al. Habanero-Java: the new adventures of old X10 , 2011, PPPJ.
[15] Sriram Krishnamoorthy,et al. Global Futures: A Multithreaded Execution Model for Global Arrays-based Applications , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).
[16] Katherine Yelick,et al. Hierarchical Work Stealing on Manycore Clusters , 2011 .
[17] Dan Bonachea. GASNet Specification, v1.1 , 2002 .
[18] Clemens C. J. Roothaan,et al. New Developments in Molecular Orbital Theory , 1951 .
[19] Lexing Ying,et al. A New Parallel Kernel-Independent Fast Multipole Method , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[20] A. Szabó,et al. Modern quantum chemistry : introduction to advanced electronic structure theory , 1982 .
[21] Richard W. Vuduc,et al. Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] V. Fock,et al. Näherungsmethode zur Lösung des quantenmechanischen Mehrkörperproblems , 1930 .
[23] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[24] Hari Sundar,et al. Bottom-Up Construction and 2: 1 Balance Refinement of Linear Octrees in Parallel , 2008, SIAM J. Sci. Comput..
[25] José E. Moreira,et al. A Volumetric FFT for BlueGene/L , 2003, HiPC.
[26] B. Tidor. Molecular dynamics simulations , 1997, Current Biology.
[27] Kenjiro Taura,et al. A Task Parallel Implementation of Fast Multipole Methods , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[28] Olivier Tardieu,et al. A work-stealing scheduler for X10's task parallelism with suspension , 2012, PPoPP '12.
[29] Vivek Sarkar,et al. Unified Analysis of Array and Object References in Strongly Typed Languages , 2000, SAS.
[30] Jarek Nieplocha,et al. Efficient Algorithms for Ghost Cell Updates on Two Classes of MPP Architectures , 2002, IASTED PDCS.
[31] David Cunningham,et al. X10 and APGAS at Petascale , 2016, ACM Trans. Parallel Comput..
[32] Adrian Prantl,et al. Interfacing Chapel with traditional HPC programming languages , 2011 .
[33] Doug Lea,et al. A Java fork/join framework , 2000, JAVA '00.
[34] John Shalf,et al. The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..
[35] Richard W. Vuduc,et al. Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Dhabaleswar K. Panda,et al. High Performance Remote Memory Access Communication: The Armci Approach , 2006, Int. J. High Perform. Comput. Appl..
[37] Sriram Krishnamoorthy,et al. Lifeline-based global load balancing , 2011, PPoPP '11.
[38] Rick Stevens,et al. Toward high‐performance computational chemistry: II. A scalable self‐consistent field program , 1996 .
[39] J. Kussmann,et al. Linear‐Scaling Methods in Quantum Chemistry , 2007 .
[40] David E. Bernholdt,et al. Programmability of the HPCS Languages: A Case Study with a Quantum Chemistry Kernel (Extended Version) , 2008 .
[41] Alistair P. Rendell,et al. Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10 , 2014, J. Comput. Chem..
[42] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[43] Kiyokuni Kawachiya,et al. Distributed garbage collection for managed X10 , 2012, X10 '12.
[44] Robert J. Harrison,et al. Performance and experience with LAPI-a new high-performance communication library for the IBM RS/6000 SP , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[45] Toshikazu Ebisuzaki,et al. Hardware accelerator for molecular dynamics: MDGRAPE-2 , 2003 .
[46] C. H. Flood,et al. The Fortress Language Specification , 2007 .
[47] S. Guan,et al. Ion traps for Fourier transform ion cyclotron resonance mass spectrometry: principles and design of geometric and electric configurations , 1995 .
[48] B. Chamberlain,et al. The State of the Chapel Union , 2013 .
[49] Philip Heidelberger,et al. The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[50] Michael J Frisch,et al. Efficient evaluation of short-range Hartree-Fock exchange in large molecules and periodic systems. , 2006, The Journal of chemical physics.
[51] Michela Taufer,et al. FENZI: GPU-Enabled Molecular Dynamics Simulations of Large Membrane Regions Based on the CHARMM Force Field and PME , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[52] Benny G. Johnson,et al. Linear scaling density functional calculations via the continuous fast multipole method , 1996 .
[53] R. Heeren,et al. Comparison of particle-in-cell simulations with experimentally observed frequency shifts between ions of the same mass-to-charge in fourier transform ion cyclotron resonance mass spectrometry , 2010, Journal of the American Society for Mass Spectrometry.
[54] Sandia Report,et al. Toward a New Metric for Ranking High Performance Computing Systems , 2013 .
[55] Michele Colajanni,et al. PSBLAS: a library for parallel linear algebra computation on sparse matrices , 2000, TOMS.
[56] L. Patacchini,et al. Explicit time-reversible orbit integration in Particle In Cell codes with static homogeneous magnetic field , 2009, J. Comput. Phys..
[57] V. Springel,et al. GADGET: a code for collisionless and gasdynamical cosmological simulations , 2000, astro-ph/0003162.
[58] Robert A. van de Geijn,et al. Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.
[59] Scott R. Kohn,et al. High-performance language interoperability for scientific computing through Babel , 2012, Int. J. High Perform. Comput. Appl..
[60] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[61] Holger Dachsel,et al. Fast and accurate determination of the Wigner rotation matrices in the fast multipole method. , 2006, The Journal of chemical physics.
[62] Mark S. Gordon,et al. General atomic and molecular electronic structure system , 1993, J. Comput. Chem..
[63] M. Snir,et al. Ghost Cell Pattern , 2010, ParaPLoP '10.
[64] Alejandro Duran,et al. The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.
[65] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[66] David Grove,et al. Supporting Array Programming in X10 , 2014, ARRAY@PLDI.
[67] Mark S. Gordon,et al. New Multithreaded Hybrid CPU/GPU Approach to Hartree-Fock. , 2012, Journal of chemical theory and computation.
[68] Vivek Sarkar,et al. Hierarchical phasers for scalable synchronization and reductions in dynamic parallelism , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[69] Bradford L. Chamberlain. The design and implementation of a region-based parallel language , 2001 .
[70] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, PARA.
[71] Josh Milthorpe,et al. Resolutions of the Coulomb Operator: VII. Evaluation of Long-Range Coulomb and Exchange Matrices. , 2013, Journal of chemical theory and computation.
[72] Mark F. Adams,et al. Chombo Software Package for AMR Applications Design Document , 2014 .
[73] Martin Head-Gordon,et al. Rotating around the quartic angular momentum barrier in fast multipole method calculations , 1996 .
[74] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.
[75] H. Berendsen. Simulating the Physical World , 2004 .
[76] Volker Dyczmons,et al. No N4-dependence in the calculation of large molecules , 1973 .
[77] Vivek Sarkar,et al. Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement , 2009, LCPC.
[78] Katherine A. Yelick,et al. Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..
[79] R. Heeren,et al. Realistic modeling of ion cloud motion in a Fourier transform ion cyclotron resonance cell by use of a particle-in-cell approach. , 2007, Rapid communications in mass spectrometry : RCM.
[80] Peter M W Gill,et al. Resolutions of the Coulomb operator. VI. Computation of auxiliary integrals. , 2011, The Journal of chemical physics.
[81] Amith R. Mamidala,et al. MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations , 2009, Hot Interconnects.
[82] C. Birdsall,et al. Plasma Physics via Computer Simulation , 2018 .
[83] Torsten Hoefler,et al. A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[84] David Grove,et al. X10 as a Parallel Language for Scientific Computation: Practice and Experience , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[85] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[86] Toyotaro Suzumura,et al. Scalable performance of ScaleGraph for large scale graph analysis , 2012, 2012 19th International Conference on High Performance Computing.
[87] Richard W. Vuduc,et al. A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[88] Jeffrey C. Carver,et al. Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[89] William N. Scherer,et al. A new vision for coarray Fortran , 2009, PGAS '09.
[90] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .
[91] Jason Duell,et al. Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations , 2004, Int. J. High Perform. Comput. Netw..
[92] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[93] Guangwen Yang,et al. Characterization of Smith-Waterman sequence database search in X10 , 2012, X10 '12.
[94] Martin Head-Gordon,et al. A Resolution-Of-The-Identity Implementation of the Local Triatomics-In-Molecules Model for Second-Order Møller-Plesset Perturbation Theory with Application to Alanine Tetrapeptide Conformational Energies. , 2005, Journal of chemical theory and computation.
[95] Tjerk P. Straatsma,et al. NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..
[96] Peter M. Kasson,et al. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit , 2013, Bioinform..
[97] David Cunningham,et al. A performance model for X10 applications: what's going on under the hood? , 2011, X10 '11.
[98] D. Zorin,et al. A kernel-independent adaptive fast multipole algorithm in two and three dimensions , 2004 .
[99] Stephen W. Taylor,et al. KWIK: Coulomb Energies in O(N) Work , 1996 .
[100] Marco Häser,et al. Improvements on the direct SCF method , 1989 .
[101] David Cunningham,et al. M3R: Increased performance for in-memory Hadoop jobs , 2012, Proc. VLDB Endow..
[102] Jakub Kurzak,et al. Massively parallel implementation of a fast multipole method for distributed memory machines , 2005, J. Parallel Distributed Comput..
[103] Jason Duell,et al. Productivity and performance using partitioned global address space languages , 2007, PASCO '07.
[104] T. Darden,et al. A smooth particle mesh Ewald method , 1995 .
[105] Martin Head-Gordon,et al. Derivation and efficient implementation of the fast multipole method , 1994 .
[106] Alistair P. Rendell,et al. PGAS‐FMM: Implementing a distributed fast multipole method using the X10 programming language , 2014, Concurr. Comput. Pract. Exp..
[107] Hans Peter Lüthi,et al. A coarse‐grain parallel implementation of the direct SCF method , 1992 .
[108] Richard W. Vuduc,et al. Brief announcement: towards a communication optimal fast multipole method and its implications at exascale , 2012, SPAA '12.
[109] Koichi Tanaka,et al. Influence of Ion-Ion Coulomb Interactions on FT-ICR Mass Spectra at a High Magnetic Field: A Many-Particle Simulation Using a Special-Purpose Computer , 2010 .
[110] William N. Scherer,et al. Hiding latency in Coarray Fortran 2.0 , 2010, PGAS '10.
[111] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[112] Michael Gschwind,et al. The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.
[113] Sreedhar B. Kodali,et al. The Asynchronous Partitioned Global Address Space Model , 2010 .
[114] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[115] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[116] R. Heeren,et al. Fourier Transform Ion Cyclotron Resonance Mass Resolution and Dynamic Range Limits Calculated by Computer Modeling of Ion Cloud Motion , 2012, Journal of The American Society for Mass Spectrometry.
[117] Ken Thompson,et al. The UNIX time-sharing system , 1974, CACM.
[118] Christos Davatzikos,et al. Low-constant parallel algorithms for finite element simulations using linear octrees , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[119] Rio Yokota,et al. An FMM Based on Dual Tree Traversal for Many-Core Architectures , 2012, ArXiv.
[120] L. Greengard,et al. A new version of the Fast Multipole Method for the Laplace equation in three dimensions , 1997, Acta Numerica.
[121] Martin Head-Gordon,et al. Advances in Methods and Algorithms in a Modern Quantum Chemistry Program Package , 2006 .
[122] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[123] Taweetham Limpanuparb,et al. Applications of Resolutions of the Coulomb Operator in Quantum Chemistry , 2012 .
[124] Ivan S Ufimtsev,et al. Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. , 2008, Journal of chemical theory and computation.
[125] Jarek Nieplocha,et al. Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit , 2006, Int. J. High Perform. Comput. Appl..
[126] A. Marshall,et al. Fourier transform ion cyclotron resonance mass spectrometry: a primer. , 1998, Mass spectrometry reviews.
[127] Jack J. Dongarra,et al. A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..
[128] J. Demmel,et al. Sun Microsystems , 1996 .
[129] Toyotaro Suzumura,et al. Introducing ScaleGraph: an X10 library for billion scale graph analytics , 2012, X10 '12.
[130] Douglas Thain,et al. Qthreads: An API for programming with millions of lightweight threads , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[131] Phillip Colella,et al. Parallel Languages and Compilers: Perspective From the Titanium Experience , 2007, Int. J. High Perform. Comput. Appl..
[132] Carsten Kutzner,et al. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.
[133] Junichiro Makino,et al. 4.45 Pflops astrophysical N-body simulation on K computer -- The gravitational trillion-body problem , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[134] Vijay A. Saraswat,et al. A Resilient Framework for Iterative Linear Algebra Applications in X10 , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[135] M. Deserno,et al. HOW TO MESH UP EWALD SUMS. II. AN ACCURATE ERROR ESTIMATE FOR THE PARTICLE-PARTICLE-PARTICLE-MESH ALGORITHM , 1998, cond-mat/9807100.
[136] Amith R. Mamidala,et al. PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[137] Shridhar R. Gadre,et al. Structure and Stability of Water Clusters (H2O)n, n ) 8-20: An Ab Initio Investigation , 2001 .
[138] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[139] Hatem Ltaief,et al. Data‐driven execution of fast multipole methods , 2012, Concurr. Comput. Pract. Exp..
[140] Guy E. Blelloch,et al. The data locality of work stealing , 2000, SPAA.
[141] Kenjiro Taura,et al. MassiveThreads: A Thread Library for High Productivity Languages , 2014, Concurrent Objects and Beyond.
[142] R W Hockney,et al. Computer Simulation Using Particles , 1966 .
[143] Michael Klemm,et al. A Proposal for Task-Generating Loops in OpenMP , 2013, IWOMP.
[144] Alistair P. Rendell,et al. Efficient update of ghost regions using active messages , 2012, 2012 19th International Conference on High Performance Computing.
[145] G. G. Hall. The molecular orbital theory of chemical valency VIII. A method of calculating ionization potentials , 1951, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.
[146] Eduard Ayguadé,et al. A Library Implementation of the Nano-Threads Programming Model , 1996, Euro-Par, Vol. II.
[147] Sadaf R. Alam,et al. DARPA's HPCS Program- History, Models, Tools, Languages , 2008, Adv. Comput..
[148] Vivek Sarkar,et al. Array optimizations for parallel implementations of high productivity languages , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.