PFunc: modern task parallelism for modern high performance computing
暂无分享,去创建一个
Andrew Lumsdaine | Amol Ghoting | Prabhanjan Kambadur | Haim Avron | Anshul Gupta | A. Lumsdaine | H. Avron | P. Kambadur | Anshul Gupta | A. Ghoting
[1] M. Hestenes,et al. Methods of conjugate gradients for solving linear systems , 1952 .
[2] Michael J. Flynn,et al. Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.
[3] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.
[4] Iain S. Duff,et al. The Multifrontal Solution of Unsymmetric Sets of Linear Equations , 1984 .
[5] Robert H. Halstead,et al. MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.
[6] Nicolas Halbwachs,et al. LUSTRE: a declarative language for real-time programming , 1987, POPL '87.
[7] Piyush Mehrotra. Programming Parallel Architectures: The BLAZE Family of Languages-Invited Talk , 1987, PPSC.
[8] Frederica Darema,et al. A single-program-multiple-data computational model for EPEX/FORTRAN , 1988, Parallel Comput..
[9] R. L. Wexelblat. Proceedings of the ACM SIGPLAN 1988 conference on Programming language design and implementation , 1988, PLDI 1989.
[10] Alexander A. Stepanov,et al. Generic Programming , 1988, ISSAC.
[11] Vivek Sarkar,et al. Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .
[12] I. Foster,et al. Strand: A practical parallel programming language , 1989 .
[13] Martín Abadi,et al. Composing Specifications , 1989, REX Workshop.
[14] Murray Cole,et al. Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .
[15] David C. Cann,et al. A Report on the Sisal Language Project , 1990, J. Parallel Distributed Comput..
[16] Behrooz Shirazi,et al. Analysis and Evaluation of Heuristic Methods for Static Task Scheduling , 1990, J. Parallel Distributed Comput..
[17] Robert H. Halstead,et al. Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, IEEE Trans. Parallel Distributed Syst..
[18] Ken Kennedy,et al. An Overview of the Fortran D Programming System , 1991, LCPC.
[19] Olivier Danvy,et al. Representing Control: a Study of the CPS Transformation , 1992, Mathematical Structures in Computer Science.
[20] Guy E. Blelloch,et al. NESL: A Nested Data-Parallel Language , 1992 .
[21] Kai Hwang,et al. Advanced computer architecture - parallelism, scalability, programmability , 1992 .
[22] Tao Yang,et al. A Comparison of Clustering Heuristics for Scheduling Directed Acycle Graphs on Multiprocessors , 1992, J. Parallel Distributed Comput..
[23] Eerke Albert Boiten,et al. Transformational derivation of (parallel) programs using skeletons , 1993 .
[24] Peter G. Harrison,et al. Parallel Programming Using Skeleton Functions , 1993, PARLE.
[25] CHARM++: A Portable Concurrent Object Oriented System Based On C++ , 1993, OOPSLA.
[26] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[27] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.
[28] Guy E. Blelloch,et al. NESL: A Nested Data-Parallel Language (Version 2.6) , 1993 .
[29] K. Mani Chandy,et al. CC++: A Declarative Concurrent Object Oriented Programming Notation , 1993 .
[30] Robert Olson,et al. Programming in FORTRAN M , 1993 .
[31] Thomas R. Gross,et al. Task Parallelism in a High Performance Fortran Framework , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.
[32] George Karypis,et al. Introduction to Parallel Computing , 1994 .
[33] Robert J. Harrison,et al. Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.
[34] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.
[35] Steven M. Hadfield. On The LU Factorization Of Sequences Of Identically Structured Sparse Matrices Within A Distributed , 1994 .
[36] Barbara M. Chapman,et al. Extending HPF for Advanced Data-Parallel Applications , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.
[37] Lawrence Rauchwerger,et al. Automatic Detection of Parallelism: A grand challenge for high performance computing , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.
[38] John A. Chandy,et al. The Paradigm Compiler for Distributed-Memory Multicomputers , 1995, Computer.
[39] Yike Guo,et al. Parallel skeletons for structured composition , 1995, PPOPP '95.
[40] K. Mani Chandy,et al. Fortran M: A Language for Modular Parallel Programming , 1995, J. Parallel Distributed Comput..
[41] Monica S. Lam,et al. Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..
[42] Jeffrey M. Squyres,et al. Object Oriented MPI (OOMPI): a class library for the Message Passing Interface , 1996, Proceedings. Second MPI Developer's Conference.
[43] Bradford Nichols,et al. Pthreads programming , 1996 .
[44] Ian T. Foster,et al. Compositional parallel programming languages , 1996, TOPL.
[45] Herbert Kuchen,et al. TPascal - A Language for Task Parallel Programming , 1996, Euro-Par, Vol. I.
[46] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[47] Rakesh Agrawal,et al. Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..
[48] Piyush Mehrotra,et al. Vienna Fortran and the Path Towards a Standard Parallel Language (Special Issue on Parallel and Distributed Supercomputing) , 1997 .
[49] Srinivasan Parthasarathy,et al. New Algorithms for Fast Discovery of Association Rules , 1997, KDD.
[50] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[51] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[52] Jeffrey J. P. Tsai,et al. Compositional verification of concurrent systems using Petri-net-based condensation rules , 1998, TOPL.
[53] Jin-Soo Kim,et al. Memory characterization of a parallel data mining workload , 1998, Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization.
[54] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[55] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[56] James Demmel,et al. An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination , 1997, SIAM J. Matrix Anal. Appl..
[57] C. Leiserson,et al. Scheduling multithreaded computations by work stealing , 1999, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[58] Sergei Gorlatch,et al. Skeletons and Transformations in an Integrated Parallel Programming Environment , 1999, PaCT.
[59] Jeremy G. Siek,et al. The Matrix Template Library: generic components for high-performance scientific computing , 1999, Comput. Sci. Eng..
[60] Ishfaq Ahmad,et al. Benchmarking and Comparison of the Task Graph Scheduling Algorithms , 1999, J. Parallel Distributed Comput..
[61] Y.-K. Kwok,et al. Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.
[62] Mohammed J. Zaki. Parallel and distributed association mining: a survey , 1999, IEEE Concurr..
[63] Krzysztof Czarnecki,et al. Generative programming - methods, tools and applications , 2000 .
[64] Michael A. Bender,et al. Online Scheduling of Parallel Programs on Heterogeneous Systems with Applications to Cilk , 2002, SPAA '00.
[65] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.
[66] Vipin Kumar,et al. Scalable Parallel Data Mining for Association Rules , 2000, IEEE Trans. Knowl. Data Eng..
[67] Alexander A. Stepanov,et al. C++ Standard Template Library , 2000 .
[68] Srinivasan Parthasarathy,et al. Parallel Data Mining for Association Rules on Shared-memory Systems , 1998 .
[69] Nancy M. Amato,et al. STAPL: A Standard Template Adaptive Parallel C++ Library , 2001 .
[70] Michael Wolf,et al. Object‐oriented analysis and design of the Message Passing Interface , 2001, Concurr. Comput. Pract. Exp..
[71] Nancy M. Amato,et al. STAPL: An Adaptive, Generic Parallel C++ Library , 2001, LCPC.
[72] Dennis Gannon,et al. HPC++ and the HPC++Lib Toolkit , 2001, Compiler Optimizations for Scalable Parallel Systems Languages.
[73] David R. Musser,et al. STL tutorial and reference guide , 2001 .
[74] Frederica Darema,et al. The SPMD Model : Past, Present and Future , 2001, PVM/MPI.
[75] Anshul Gupta,et al. Recent advances in direct methods for solving unsymmetric sparse systems of linear equations , 2002, TOMS.
[76] Herbert Kuchen,et al. A Skeleton Library , 2002, Euro-Par.
[77] Anshul Gupta,et al. Improved Symbolic and Numerical Factorization Algorithms for Unsymmetric Sparse Matrices , 2002, SIAM J. Matrix Anal. Appl..
[78] Peter Sanders,et al. [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.
[79] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[80] Bart Goethals,et al. Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004) , 2004 .
[81] Joost N. Kok,et al. A quickstart in frequent structure mining can make a difference , 2004, KDD.
[82] Murray Cole,et al. Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..
[83] Ken Kennedy,et al. Defining and Measuring the Productivity of Programming Languages , 2004, Int. J. High Perform. Comput. Appl..
[84] Srinivasan Parthasarathy,et al. Parallel algorithms for mining frequent structural motifs in scientific data , 2004, ICS '04.
[85] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[86] Jeffrey C. Carver,et al. Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[87] Srinivasan Parthasarathy,et al. Adaptive Parallel Graph Mining for CMP Architectures , 2006, Sixth International Conference on Data Mining (ICDM'06).
[88] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[89] Srinivasan Parthasarathy,et al. Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.
[90] Andrew Lumsdaine,et al. Modernizing the C++ Interface to MPI , 2006, PVM/MPI.
[91] Bjarne Stroustrup,et al. Specifying C++ concepts , 2006, POPL '06.
[92] Jonathan W. Berry,et al. Software and Algorithms for Graph Queries on Multithreaded Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[93] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[94] Alejandro Duran,et al. A Proposal for Task Parallelism in OpenMP , 2007, IWOMP.
[95] Phillip Colella,et al. Parallel Languages and Compilers: Perspective From the Titanium Experience , 2007, Int. J. High Perform. Comput. Appl..
[96] Jeffrey C. Carver,et al. Software Development Environments for Scientific and Engineering Software: A Series of Case Studies , 2007, 29th International Conference on Software Engineering (ICSE'07).
[97] Jay Hoeflinger. Programming with cluster openMP , 2007, PPOPP.
[98] Andrew Lumsdaine,et al. Parallelization of Generic Libraries Based on Type Properties , 2007, International Conference on Computational Science.
[99] Victor Luchangco,et al. The Fortress Language Specification Version 1.0 , 2007 .
[100] Lawrence Snyder,et al. The design and development of ZPL , 2007, HOPL.
[101] Jonathan W. Berry,et al. Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..
[102] Claudia Fohry,et al. Problems, Workarounds and Possible Solutions Implementing the Singleton Pattern with C++ and OpenMP , 2007, IWOMP.
[103] James Reinders,et al. Intel® threading building blocks , 2008 .
[104] Peter M. Kogge,et al. On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications , 2007, IEEE Transactions on Computers.
[105] Sriram Krishnamoorthy,et al. Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing , 2008, 2008 37th International Conference on Parallel Processing.
[106] Verdi March,et al. Survey on Parallel Programming Model , 2008, NPC.
[107] Andrew Lumsdaine,et al. OpenMP Extensions for Generic Libraries , 2008, IWOMP.
[108] Andrew Lumsdaine,et al. Design and implementation of a high-performance MPI for C# and the common language infrastructure , 2008, PPOPP.
[109] George Almási,et al. Performance without pain = productivity: data layout and collective communication in UPC , 2008, PPoPP.
[110] Taiichi Yuasa,et al. Backtracking-based load balancing , 2009, PPoPP '09.
[111] Shirish Tatikonda,et al. Mining Tree-Structured Data on Multicore Systems , 2009, Proc. VLDB Endow..
[112] Andrew Lumsdaine,et al. Extending Task Parallelism For Frequent Pattern Mining , 2012, PARCO.
[113] Edward T. Grochowski,et al. Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[114] Yi Guo,et al. Work-first and help-first scheduling policies for async-finish task parallelism , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[115] Torsten Hoefler,et al. Demand-driven execution of static directed acyclic graphs using task parallelism , 2009, 2009 International Conference on High Performance Computing (HiPC).
[116] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.