TORCH Computational Reference Kernels - A Testbed for Computer Science Research
暂无分享,去创建一个
Samuel Williams | James Demmel | Erich Strohmaier | David H. Bailey | Khaled Z. Ibrahim | Kamesh Madduri | Alexander D. Kaiser | J. Demmel | D. Bailey | Samuel Williams | E. Strohmaier | Kamesh Madduri | K. Ibrahim | A. Kaiser
[1] Samuel Williams,et al. Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[2] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[3] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[4] David H. Bailey,et al. Random Generators and Normal Numbers , 2002, Exp. Math..
[5] Martin Aigner,et al. Sorting by insertion of leading elements , 1987, J. Comb. Theory, Ser. A.
[6] Jeffrey Scott Vitter,et al. Efficient sorting using registers and caches , 2000, JEAL.
[7] Keshav Pingali,et al. Lonestar: A suite of parallel irregular programs , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[8] Oliver Günther,et al. Multidimensional access methods , 1998, CSUR.
[9] David A. Bader,et al. Approximating Betweenness Centrality , 2007, WAW.
[10] Jaeyoung Choi,et al. Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..
[11] C. Pomerance,et al. Prime Numbers: A Computational Perspective , 2002 .
[12] Ulrich Meyer,et al. A computational study of external-memory BFS algorithms , 2006, SODA '06.
[13] Jack Dongarra,et al. Special Issue on Program Generation, Optimization, and Platform Adaptation , 2005, Proc. IEEE.
[14] S. McCormick,et al. A multigrid tutorial (2nd ed.) , 2000 .
[15] Gerhard Wellein,et al. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.
[16] Gene H. Golub,et al. Matrix computations (3rd ed.) , 1996 .
[17] Glenn Reinman,et al. ParallAX: an architecture for real-time physics , 2007, ISCA '07.
[18] Torsten Hoefler,et al. A space-efficient parallel algorithm for computing betweenness centrality in distributed memory , 2010, 2010 International Conference on High Performance Computing.
[19] Katherine A. Yelick,et al. Multi-threading and one-sided communication in parallel LU factorization , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[20] Guy E. Blelloch,et al. An Experimental Analysis of Parallel Sorting Algorithms , 1998, Theory of Computing Systems.
[21] J. Demmel,et al. A TESTING INFRASTRUCTURE FOR LAPACK ’ S SYMMETRIC EIGENSOLVERS , 2007 .
[22] Guy E. Blelloch,et al. A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.
[23] Derek G. Corneil,et al. Parallel computations in graph theory , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).
[24] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.
[25] David A. Bader. Designing Scalable Synthetic Compact Applications for Benchmarking High Productivity Computing Systems , 2006 .
[26] Yen-Kuang Chen,et al. The ALPBench benchmark suite for complex multimedia applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..
[27] David A. Bader,et al. Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[28] Julien Langou,et al. Parallel tiled QR factorization for multicore architectures , 2007, Concurr. Comput. Pract. Exp..
[29] Matemática,et al. Society for Industrial and Applied Mathematics , 2010 .
[30] Berkin Özisikyilmaz,et al. MineBench: A Benchmark Suite for Data Mining Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.
[31] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[32] D LamMonica,et al. The cache performance and optimizations of blocked algorithms , 1991 .
[33] Samuel Williams,et al. A Kernel Testbed for Parallel Architecture, Language, and Performance Research , 2010 .
[34] Michael Garland,et al. Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[35] W. M. Gentleman,et al. Fast Fourier Transforms: for fun and profit , 1966, AFIPS '66 (Fall).
[36] Vipin Kumar,et al. Scalable parallel formulations of the barnes-hut method for n-body simulations , 1994, Supercomputing '94.
[37] Samuel Williams,et al. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[38] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[39] David A. Bader,et al. Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[40] David B. Yoffie,et al. Intel Corporation 2005 , 2005 .
[41] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[42] M. F.,et al. Bibliography , 1985, Experimental Gerontology.
[43] Sirpa Mäki. Formulas for computing ξA , 1980 .
[44] J. Anthonisse. The rush in a directed graph , 1971 .
[45] Leonard M. Freeman,et al. A set of measures of centrality based upon betweenness , 1977 .
[46] Alan George,et al. QR Factorization of a Dense Matrix on a Hypercube Multiprocessor , 1990, SIAM J. Sci. Comput..
[47] Peter Sanders,et al. Better Approximation of Betweenness Centrality , 2008, ALENEX.
[48] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[49] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[50] David Eppstein,et al. Fast approximation of centrality , 2000, SODA '01.
[51] David A. Bader,et al. National Laboratory Lawrence Berkeley National Laboratory Title A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets Permalink , 2009 .
[52] Mihalis Yannakakis,et al. High-probability parallel transitive closure algorithms , 1990, SPAA '90.
[53] Richard P. Martin,et al. Fast parallel sorting under logp: from theory to practice , 1993 .
[54] Richard E. Crandall,et al. Large-scale FFTs and convolutions on Apple hardware , 2008 .
[55] David A. Bader,et al. BioPerf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..
[56] Piet Hut,et al. A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.
[57] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[58] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[59] Jon Louis Bentley,et al. Engineering a sort function , 1993, Softw. Pract. Exp..
[60] Becky Verastegui,et al. Proceedings of the 2007 ACM/IEEE conference on Supercomputing , 2007, HiPC 2007.
[61] Samuel Williams,et al. Lattice Boltzmann simulation optimization on leading multicore platforms , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[62] Ulrik Brandes,et al. On variants of shortest-path betweenness centrality and their generic computation , 2008, Soc. Networks.
[63] James Demmel,et al. Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[64] David H. Bailey,et al. Performance results for two of the NAS parallel benchmarks , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[65] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[66] Jonathan M. Borwein,et al. Advances in the theory of box integrals , 2010, Math. Comput..
[67] Philip S. Yu,et al. CellSort: High Performance Sorting on the Cell Processor , 2007, VLDB.
[68] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[69] Kunle Olukotun,et al. STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.
[70] Kenneth E. Batcher,et al. Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.
[71] David H. Bailey. A High-Performance FFT Algorithm for Vector Supercomputers , 1987, PPSC.
[72] U. Brandes. A faster algorithm for betweenness centrality , 2001 .
[73] Jeffrey Scott Vitter,et al. A Simple and Efficient Parallel Disk Mergesort , 2002, Theory of Computing Systems.
[74] David A. Bader,et al. Practical parallel algorithms for personalized communication and integer sorting , 1996, JEAL.
[75] Edward A. Lee,et al. The Parallel Computing Laboratory at U.C. Berkeley: A Research Agenda Based on the Berkeley View , 2008 .
[76] Fabrizio Petrini,et al. Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[77] Richard E. Crandall. Prime numbers : a computational perspective / Richard Crandall and Carl Pomerance , 2005 .
[78] Janez Brest,et al. A sorting algorithm on a PC cluster , 2000, SAC '00.
[79] Edmond Chow,et al. A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[80] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[81] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.
[82] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[83] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[84] William H. Press,et al. Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .
[85] Pradeep Dubey,et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.
[86] Jeffrey Scott Vitter,et al. Efficient Sorting Using Registers and Caches , 2000, Algorithm Engineering.
[87] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[88] Bülent Abali,et al. Balanced Parallel Sort on Hypercube Multiprocessors , 1993, IEEE Trans. Parallel Distributed Syst..
[89] Michael J. Quinn,et al. Parallel graph algorithms , 1984, CSUR.
[90] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[91] David H. Bailey,et al. FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[92] C. Loan. Computational Frameworks for the Fast Fourier Transform , 1992 .