Optimizing graph algorithms for improved cache performance
暂无分享,去创建一个
[1] Sandeep Sen,et al. Towards a theory of cache-efficient algorithms , 2000, SODA '00.
[2] Alfred V. Aho,et al. The Design and Analysis of Computer Algorithms , 1974 .
[3] Viktor K. Prasanna,et al. Analysis of memory hierarchy performance of block data layout , 2002, Proceedings International Conference on Parallel Processing.
[4] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[5] Viktor K. Prasanna,et al. Dynamic data layouts for cache-conscious factorization of DFT , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[6] Viktor K. Prasanna,et al. Cache conscious Walsh-Hadamard transform , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[7] Ellis Horowitz,et al. Fundamentals of Computer Algorithms , 1978 .
[8] David A. Patterson,et al. Computer architecture (2nd ed.): a quantitative approach , 1996 .
[9] Peter Sanders,et al. Fast priority queues for cached memory , 1999, JEAL.
[10] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[11] Sartaj Sahni,et al. A Blocked All-Pairs Shortest-Path Algorithm , 2000, SWAT.
[12] Sartaj Sahni,et al. A blocked all-pairs shortest-paths algorithm , 2003, ACM J. Exp. Algorithmics.
[13] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[14] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[15] Siddhartha Chatterjee,et al. Cache-efficient matrix transposition , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[16] Sartaj Sahni,et al. Data Structures, Algorithms and Applications in Java , 1998 .
[17] Sabih H. Gerez,et al. Algorithms for VLSI design automation , 1998 .
[18] Peter M. Kogge,et al. The Characterization of Data Intensive Memory Workloads on Distributed PIM Systems , 2000, Intelligent Memory Systems.
[19] Wilson C. Hsieh,et al. Impulse: Memory system support for scientific applications , 1999, Sci. Program..
[20] Michael Brenner,et al. Multiagent Planning with Partially Ordered Temporal Plans , 2003, IJCAI.
[21] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.
[22] James R. Larus,et al. Cache-conscious structure layout , 1999, PLDI '99.
[23] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[24] Sally A. McKee,et al. Caches as filters: a new approach to cache analysis , 1998, Proceedings. Sixth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.98TB100247).
[25] Sunita Sarawagi,et al. On computing the data cube , 1996 .
[26] Christos H. Papadimitriou,et al. On the Floyd-Warshall Algorithm for Logic Programs , 1999, J. Log. Program..
[27] Mateo Valero,et al. Eliminating cache conflict misses through XOR-based placement functions , 1997, ICS '97.
[28] Mehryar Mohri,et al. A weight pushing algorithm for large vocabulary speech recognition , 2001, INTERSPEECH.
[29] Miodrag Potkonjak,et al. Exposure in wireless Ad-Hoc sensor networks , 2001, MobiCom '01.
[30] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[31] Mahmut T. Kandemir,et al. Improving Cache Locality by a Combination of Loop and Data Transformation , 1999, IEEE Trans. Computers.
[32] Peter J. Varman,et al. Optimal prefetching and caching for parallel I/O sytems , 2001, SPAA '01.
[33] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[34] Yves Robert,et al. Loop partitioning versus tiling for cache-based multiprocessors , 1998 .
[35] Viktor K. Prasanna,et al. Tiling, Block Data Layout, and Memory Hierarchy Performance , 2003, IEEE Trans. Parallel Distributed Syst..
[36] Joon-Sang Park,et al. Optimizing graph algorithms for improved cache performance , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[37] Nikil D. Dutt,et al. Memory data organization for improved cache performance in embedded processor applications , 1997, TODE.
[38] Alex C. Mueller,et al. The SPIRAL project , 1995 .
[39] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[40] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[41] Richard E. Ladner,et al. The influence of caches on the performance of heaps , 1996, JEAL.
[42] Hai Jin,et al. Parallel I/O Systems , 2002 .
[43] M. Kanehisa,et al. Extraction of correlated gene clusters by multiple graph comparison. , 2001, Genome informatics. International Conference on Genome Informatics.
[44] Mihalis Yannakakis,et al. Graph-theoretic methods in database theory , 1990, PODS.
[45] Dimitri P. Bertsekas,et al. Data Networks , 1986 .
[46] Alok N. Choudhary. Parallel I/O Systems - Guest Editor's Introduction , 1993, J. Parallel Distributed Comput..