Nonlinear array layouts for hierarchical memory systems
暂无分享,去创建一个
Mithuna Thottethodi | Siddhartha Chatterjee | Alvin R. Lebeck | Vibhor V. Jain | Shyam Mundhra | Mithuna Thottethodi | A. Lebeck | S. Chatterjee | Shyam Mundhra
[1] Mark D. Hill,et al. Surpassing the TLB performance of superpages with less operating system support , 1994, ASPLOS VI.
[2] Chau-Wen Tseng,et al. Eliminating conflict misses for high performance architectures , 1998, ICS '98.
[3] Christos Faloutsos,et al. Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..
[4] Ioana Banicescu,et al. Load Balancing and Data Locality Via Fractiling: An Experimental Study , 1996 .
[5] Remzi H. Arpaci-Dusseau,et al. Empirical evaluation of the CRAY-T3D: a compiler perspective , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[6] D. Hilbert. Ueber die stetige Abbildung einer Line auf ein Flächenstück , 1891 .
[7] H. Sagan. Space-filling curves , 1994 .
[8] David Salesin,et al. Wavelets for computer graphics: theory and applications , 1996 .
[9] Theodore Bially,et al. Space-filling curves: Their generation and their application to bandwidth reduction , 1969, IEEE Trans. Inf. Theory.
[10] V. Strassen. Gaussian elimination is not optimal , 1969 .
[11] Uzi Vishkin,et al. Can parallel algorithms enhance serial implementation? , 1996, CACM.
[12] Richard E. Ladner,et al. Cache performance analysis of traversals and random accesses , 1999, SODA '99.
[13] G. Peano. Sur une courbe, qui remplit toute une aire plane , 1890 .
[14] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[15] Alan George,et al. Computer Solution of Large Sparse Positive Definite , 1981 .
[16] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[17] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[18] H. V. Jagadzsh. Linear Clustering of Objects with Multiple Attributes , 1998 .
[19] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[20] Robert Laurini. Graphical Data Bases Built on Peano Space-filling Curves , 1985, Eurographics.
[21] Ken Kennedy,et al. Automatic data layout for distributed-memory machines , 1998, TOPL.
[22] John R. Gilbert,et al. Optimal evaluation of array expressions on massively parallel machines , 1995, TOPL.
[23] David A. Wood,et al. Active Memory: A New Abstraction for Memory System Simulation , 1997, ACM Trans. Model. Comput. Simul..
[24] M. S. Warren,et al. A parallel hashed Oct-Tree N-body algorithm , 1993, Supercomputing '93.
[25] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[26] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[27] Leigh Stoller,et al. Increasing TLB reach using superpages backed by shadow memory , 1998, ISCA.
[28] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[29] Richard E. Ladner,et al. The influence of caches on the performance of heaps , 1996, JEAL.
[30] Utpal Banerjee,et al. Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.
[31] Harold S. Stone,et al. Footprints in the cache , 1986, SIGMETRICS '86/PERFORMANCE '86.
[32] Garth A. Gibson,et al. Report of the Working Group on Storage I/O for Large-Scale Computing , 1996 .
[33] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[34] Richard E. Ladner,et al. Caches and algorithms , 1996 .
[35] Olivier Temam,et al. Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply , 1995, TOPL.
[36] Guy L. Steele,et al. The High Performance Fortran Handbook , 1993 .
[37] Mithuna Thottethodi,et al. Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.
[38] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[39] J. Pasciak,et al. Computer solution of large sparse positive definite systems , 1982 .
[40] Ioana Banicescu,et al. Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations , 1995, SC.
[41] Scott B. Baden,et al. Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves , 1996, IEEE Trans. Parallel Distributed Syst..
[42] Larry Carter,et al. Hierarchical tiling for improved superscalar performance , 1995, Proceedings of 9th International Parallel Processing Symposium.
[43] D. Hilbert. Über die stetige Abbildung einer Linie auf ein Flächenstück , 1935 .
[44] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[45] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[46] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[47] J. L. Hennessy,et al. An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors , 1993, Supercomputing '93.
[48] Shang-Hua Teng,et al. High performance Fortran for highly irregular problems , 1997, PPOPP '97.
[49] Guy L. Steele,et al. Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines , 1990, J. Parallel Distributed Comput..
[50] Sharad Malik,et al. Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.
[51] Linda Stals,et al. Techniques For Improving The Data Locality Of Iterative Methods , 1997 .
[52] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[53] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[54] D HillMark,et al. Surpassing the TLB performance of superpages with less operating system support , 1994 .
[55] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[56] Chandra Krintz,et al. Cache-conscious data placement , 1998, ASPLOS VIII.
[57] Mary E. Mace. Memory storage patterns in parallel processing , 1987, The Kluwer international series in engineering and computer science.
[58] Mithuna Thottethodi,et al. Tuning Strassen's Matrix Multiplication for Memory Efficiency , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[59] Manish Gupta,et al. Automatic Data Partitioning on Distributed Memory Multicomputers , 1992 .
[60] Sandeep Sen,et al. Towards a theory of cache-efficient algorithms , 2000, SODA '00.
[61] Karim Esseghir. Improving data locality for caches , 1993 .
[62] James R. Larus,et al. Improving Pointer-Based Codes Through Cache-Conscious Data Placement , 1998 .
[63] Jack J. Dongarra,et al. A proposal for a set of level 3 basic linear algebra subprograms , 1987, SGNM.
[64] Alan Jay Smith,et al. Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.