An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay
暂无分享,去创建一个
[1] M. Head‐Gordon,et al. A multipole acceptability criterion for electronic structure theory , 1998 .
[2] Michael J. Frisch,et al. A linear scaling method for Hartree–Fock exchange calculations of large molecules , 1996 .
[3] D. Hilbert. Über die stetige Abbildung einer Linie auf ein Flächenstück , 1935 .
[4] P. Colella,et al. Local adaptive mesh refinement for shock hydrodynamics , 1989 .
[5] John Strain,et al. The Fast Gauss Transform with Variable Scales , 1991, SIAM J. Sci. Comput..
[6] G. Peano. Sur une courbe, qui remplit toute une aire plane , 1890 .
[7] Hanan Samet,et al. Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.
[8] Anastasia Ailamaki,et al. Improving hash join performance through prefetching , 2004, Proceedings. 20th International Conference on Data Engineering.
[9] Zhimin Gu,et al. Performance comparison of data prefetching for pointer-chasing applications , 2009, 2009 First International Conference on Information Science and Engineering.
[10] Per Stenström,et al. A prefetching technique for irregular accesses to linked data structures , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[11] Irene Gargantini,et al. An effective way to represent quadtrees , 1982, CACM.
[12] Vlastimil Havran,et al. Hierarchical visibility culling with occlusion trees , 1998, Proceedings. Computer Graphics International (Cat. No.98EX149).
[13] Eric Schwegler,et al. Fast assembly of the Coulomb matrix: A quantum chemical tree code , 1996 .
[14] Gustavo E. Scuseria,et al. Linear scaling conjugate gradient density matrix search as an alternative to diagonalization for first principles electronic structure calculations , 1997 .
[15] James Demmel,et al. Stability of block algorithms with fast level-3 BLAS , 1992, TOMS.
[16] L. Greengard,et al. Regular Article: A Fast Adaptive Multipole Algorithm in Three Dimensions , 1999 .
[17] Jeremy D. Frens,et al. Language support for Morton-order matrices , 2001, PPoPP '01.
[18] Margaret H. Dunham,et al. Join processing in relational databases , 1992, CSUR.
[19] Michele Benzi,et al. Decay Properties of Spectral Projectors with Applications to Electronic Structure , 2012, SIAM Rev..
[20] Sriram Krishnamoorthy,et al. Data-driven fault tolerance for work stealing computations , 2012, ICS '12.
[21] Michael S. Warren,et al. A Parallel, Portable and Versatile Treecode , 1995, PPSC.
[22] David J. DeWitt,et al. Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines , 1990, VLDB.
[23] Hanan Samet,et al. A Fast Similarity Join Algorithm Using Graphics Processing Units , 2008, 2008 IEEE 24th International Conference on Data Engineering.
[24] Bruno Arnaldi,et al. New trends in collision detection performance , 2009 .
[25] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .
[26] Giulia Galli,et al. Linear scaling methods for electronic structure calculations and quantum molecular dynamics simulations , 1996 .
[27] Yang,et al. Direct calculation of electron density in density-functional theory. , 1991, Physical review letters.
[28] Dario Bini,et al. Stability of fast algorithms for matrix multiplication , 1980 .
[29] Laxmikant V. Kalé,et al. Work stealing and persistence-based load balancers for iterative overdecomposed applications , 2012, HPDC '12.
[30] Karen D. Devinea,et al. New Challenges in Dynamic Load Balancing , 2004 .
[31] Paul H. J. Kelly,et al. Improving the Performance of Morton Layout by Array Alignment and Loop Unrolling: Reducing the Price of Naivety , 2003, LCPC.
[32] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[33] John R. Gilbert,et al. Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..
[34] A. V. Duin,et al. A Divide-and-Conquer/Cellular-Decomposition Framework for Million-to-Billion Atom Simulations of Chemical Reactions , 2007 .
[35] George Roussos,et al. A New Error Estimate of the Fast Gauss Transform , 2002, SIAM J. Sci. Comput..
[36] Lars Karlsson,et al. Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion , 2012, TOMS.
[37] L. Greengard,et al. A new version of the Fast Multipole Method for the Laplace equation in three dimensions , 1997, Acta Numerica.
[38] Michele Benzi,et al. Orderings for Factorized Sparse Approximate Inverse Preconditioners , 1999, SIAM J. Sci. Comput..
[39] George E. Karniadakis,et al. A sharp error estimate for the fast Gauss transform , 2006, J. Comput. Phys..
[40] Michael S. Warren,et al. Astrophysical N-body simulations using hierarchical tree data structures , 1992, Proceedings Supercomputing '92.
[41] A. Szabó,et al. Modern quantum chemistry : introduction to advanced electronic structure theory , 1982 .
[42] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[43] David E. Manolopoulos,et al. Canonical purification of the density matrix in electronic-structure theory , 1998 .
[44] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[45] Richard E. Ladner,et al. Algorithms to Take Advantage of Hardware Prefetching , 2007, ALENEX.
[46] Eric Schwegler,et al. Modern Developments in Hartree-Fock Theory: Fast Methods for Computing the Coulomb Matrix , 1996 .
[47] Keshav Pingali,et al. Is Cache-Oblivious DGEMM Viable? , 2006, PARA.
[48] Srinivas Aluru,et al. Parallel domain decomposition and load balancing using space-filling curves , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[49] Hanan Samet,et al. Iterative spatial join , 2003, TODS.
[50] Chia-Lin Yang,et al. Push vs. pull: data movement for linked data structures , 2000, ICS '00.
[51] Min Zhou,et al. Experiences and lessons learned with a portable interface to hardware performance counters , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[52] Joost VandeVondele,et al. Linear Scaling Self-Consistent Field Calculations with Millions of Atoms in the Condensed Phase. , 2012, Journal of chemical theory and computation.
[53] R. Mcweeny,et al. The density matrix in self-consistent field theory I. Iterative construction of the density matrix , 1956, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.
[54] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[55] Michael S. Warren,et al. A portable parallel particle program , 1995 .
[56] M. Challacombe. A general parallel sparse-blocked matrix multiply for linear scaling SCF theory , 2000 .
[57] S. Goedecker. Electronic Structure Methods Exhibiting Linear Scaling of the Computational Effort with Respect to the Size of the System , 1998, cond-mat/9806073.
[58] Hanan Samet,et al. Data-Parallel Spatial Join Algorithms , 1994, 1994 International Conference on Parallel Processing Vol. 3.
[59] Virginia Vassilevska Williams,et al. Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.
[60] Wei-Fen Lin,et al. Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[61] Eric Schwegler,et al. Linear scaling computation of the Fock matrix. II. Rigorous bounds on exchange integrals and incremental Fock build , 1997 .
[62] Donald Yeung,et al. The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems , 2004, J. Instr. Level Parallelism.
[63] S. Chandrasekaran,et al. A fast adaptive solver for hierarchically semiseparable representations , 2005 .
[64] Guohua Jin,et al. Using Space-filling Curves for Computation Reordering , 2005 .
[65] V. Strassen. Gaussian elimination is not optimal , 1969 .
[66] Wolfgang Hackbusch,et al. Construction and Arithmetics of H-Matrices , 2003, Computing.
[67] Esteban Hernandez Barragan,et al. Performance analisys on multicore system using PAPI , 2011, 2011 6th Colombian Computing Congress (CCC).
[68] Stamatis Vassiliadis,et al. SAD Prefetching for MPEG4 Using Flux Caches , 2006, SAMOS.
[69] Arieh Iserles. How Large is the Exponential of a Banded Matrix , 1999 .
[70] D. Truhlar. Recent progress in density functional theory , 2014 .
[71] Michael S. Warren,et al. A parallel hashed oct-tree N-body algorithm , 1993, Supercomputing '93. Proceedings.
[72] Michael Bader,et al. Hardware-Oriented Implementation of Cache Oblivious Matrix Operations Based on Space-Filling Curves , 2007, PPAM.
[73] George Ho,et al. PAPI: A Portable Interface to Hardware Performance Counters , 1999 .
[74] Matt Challacombe,et al. Linear scaling computation of the Fock matrix. VI. Data parallel computation of the exchange-correlation matrix , 2003 .
[75] Shivkumar Chandrasekaran,et al. A Fast ULV Decomposition Solver for Hierarchically Semiseparable Representations , 2006, SIAM J. Matrix Anal. Appl..
[76] Ken Kennedy,et al. Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings , 2001, International Journal of Parallel Programming.
[77] John R. Gilbert,et al. On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[78] Leslie Greengard,et al. The Fast Gauss Transform , 1991, SIAM J. Sci. Comput..
[79] Stefan Goedecker,et al. Linear scaling electronic structure methods in chemistry and physics , 2003, Comput. Sci. Eng..
[80] Jon Louis Bentley,et al. Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.
[81] Eric Schwegler,et al. Linear scaling computation of the Fock matrix , 1997 .
[82] Greg Stitt,et al. Recursion flattening , 2008, GLSVLSI '08.
[83] Marsha Berger,et al. Three-Dimensional Adaptive Mesh Refinement for Hyperbolic Conservation Laws , 1994, SIAM J. Sci. Comput..
[84] James Demmel,et al. Fast matrix multiplication is stable , 2006, Numerische Mathematik.
[85] Christian Ochsenfeld,et al. Linear and sublinear scaling formation of Hartree-Fock-type exchange matrices , 1998 .
[86] William F. Moss,et al. Decay rates for inverses of band matrices , 1984 .
[87] Tai-Sung Lee,et al. Linear-scaling quantum mechanical calculations of biological molecules: The divide-and-conquer approach , 1998 .
[88] Peiyi Tang,et al. Complete inlining of recursive calls: beyond tail-recursion elimination , 2006, ACM-SE 44.
[89] Sylvain Lefebvre,et al. Perfect spatial hashing , 2006, SIGGRAPH 2006.
[90] Peter H. Beckman. Parallel LU decomposition for sparse matrices using quadtrees on a shared-heap multiprocessor , 1993 .
[91] Ramani Duraiswami,et al. Improved Fast Gauss Transform , 2003 .
[92] Matt Challacombe,et al. Linear scaling computation of the Fock matrix. V. Hierarchical Cubature for numerical integration of the exchange-correlation matrix , 2000 .
[93] Thomas Lewiner,et al. Fast Generation of Pointerless Octree Duals , 2010, Comput. Graph. Forum.
[94] P. Campbell,et al. Dynamic Octree Load Balancing Using Space-Filling Curves ∗ , 2003 .
[95] Hanan Samet,et al. The Design and Analysis of Spatial Data Structures , 1989 .
[96] Matt Challacombe,et al. Fast Multiplication of Matrices with Decay , 2010, ArXiv.
[97] David R. Bowler,et al. Recent progress in linear scaling ab initio electronic structure techniques , 2002 .
[98] V. Rokhlin,et al. Rapid Evaluation of Potential Fields in Three Dimensions , 1988 .
[99] S. Goedecker. Linear scaling electronic structure methods , 1999 .
[100] Don Coppersmith,et al. Matrix multiplication via arithmetic progressions , 1987, STOC.
[101] Raphael Yuster,et al. Fast sparse matrix multiplication , 2004, TALG.
[102] John R. Gilbert,et al. Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.
[103] D. Bowler,et al. O(N) methods in electronic structure calculations. , 2011, Reports on progress in physics. Physical Society.
[104] Gustavo E. Scuseria,et al. Semiempirical methods with conjugate gradient density matrix search to replace diagonalization for molecular systems containing thousands of atoms , 1997 .
[105] R. Mcweeny,et al. The density matrix in self-consistent field theory II. Applications in the molecular orbital theory of conjugated systems , 1956, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.
[106] Pradeep Dubey,et al. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..
[107] Matt Challacombe,et al. A simplified density matrix minimization for linear scaling self-consistent field theory , 1999 .
[108] M. Benzi,et al. DECAY BOUNDS AND ( ) ALGORITHMS FOR APPROXIMATING FUNCTIONS OF SPARSE MATRICES , 2007 .
[109] Keshav Pingali,et al. An experimental comparison of cache-oblivious and cache-conscious programs , 2007, SPAA '07.
[110] James Coole,et al. Traversal caches: a first step towards FPGA acceleration of pointer-based data structures , 2008, CODES+ISSS '08.
[111] Sally A. McKee,et al. Can hardware performance counters be trusted? , 2008, 2008 IEEE International Symposium on Workload Characterization.
[112] L. Hernquist. Hierarchical N-body methods , 1987 .
[113] Weitao Yang,et al. A density‐matrix divide‐and‐conquer approach for electronic structure calculations of large molecules , 1995 .
[114] M. Berger,et al. Adaptive mesh refinement for hyperbolic partial differential equations , 1982 .
[115] Anders M. N. Niklasson,et al. Trace resetting density matrix purification in O(N) self-consistent-field theory , 2003 .
[116] David S. Wise. Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free , 2000, Euro-Par.
[117] Matt Challacombe,et al. Density matrix perturbation theory. , 2003, Physical review letters.
[118] David S. Wise,et al. The Opie compiler from row-major source to Morton-ordered matrices , 2004, WMPI '04.
[119] Rasmus Pagh,et al. Faster join-projects and sparse matrix multiplications , 2009, ICDT '09.
[120] James R. Larus,et al. Making Pointer-Based Data Structures Cache Conscious , 2000, Computer.
[122] Siddhartha Chatterjee,et al. The Combinatorics of Cache Misses during Matrix Multiplication , 2001, J. Comput. Syst. Sci..
[123] G. Golub,et al. Bounds for the Entries of Matrix Functions with Applications to Preconditioning , 1999 .
[124] Jehee Lee,et al. Linkless Octree Using Multi‐Level Perfect Hashing , 2009, Comput. Graph. Forum.
[125] David S. Wise,et al. Analyzing block locality in Morton-order and Morton-hybrid matrices , 2007, CARN.
[126] David S. Wise,et al. Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms , 2006, MSPC '06.
[127] David S. Wise,et al. Experiments with Quadtree Representation of Matrices , 1988, ISSAC.
[128] David S. Wise. Representing matrices as quadtrees for parallel processors: extended abstract , 1984, SIGS.
[129] David S. Wise,et al. Costs of Quadtree Representation of Nondense Matrices , 1990, J. Parallel Distributed Comput..
[130] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[131] Kathryn S. McKinley,et al. Guided region prefetching: a cooperative hardware/software approach , 2003, ISCA '03.
[132] David S. Wise,et al. Representation-transparent matrix algorithms with scalable performance , 2007, ICS '07.
[133] He Yong. Survey on real-time collision detection algorithms , 2008 .