Cilk: efficient multithreaded computing
暂无分享,去创建一个
[1] P. Stenstrom. VLSI support for a cactus stack oriented memory organization , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume I: Architecture Track.
[2] David E. Culler,et al. Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.
[3] Charles E. Leiserson,et al. Efficient Detection of Determinacy Races in Cilk Programs , 1997, SPAA '97.
[4] F. Leighton,et al. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .
[5] Kirk L. Johnson,et al. CRL: high-performance all-software distributed shared memory , 1995, SOSP.
[6] Joel Moses. The function of FUNCTION in LISP or why the FUNARG problem should be called the environment problem , 1970, SIGS.
[7] Monica S. Lam,et al. The design and evaluation of a shared object system for distributed memory machines , 1994, OSDI '94.
[8] Robert D. Blumofe,et al. Executing multithreaded programs efficiently , 1995 .
[9] Edith Schonberg,et al. Detecting access anomalies in programs with critical sections , 1991, PADD '91.
[10] Mustaque Ahamad,et al. Implementing and programming causal distributed shared memory , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.
[11] Charles E. Leiserson,et al. Detecting data races in Cilk programs that use locks , 1998, SPAA '98.
[12] M. F.,et al. Bibliography , 1985, Experimental Gerontology.
[13] Andrew V. Goldberg,et al. A new approach to the maximum flow problem , 1986, STOC '86.
[14] Robert H. B. Netzer,et al. Efficient Race Condition Detection for Shared-Memory Programs with Post/Wait Synchronization , 1992, International Conference on Parallel Processing.
[15] James R. Larus,et al. Tempest and typhoon: user-level shared memory , 1994, ISCA '94.
[16] Robert H. Halstead,et al. Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.
[17] David S. Wise. Representing Matrices as Quadtrees for Parallel Processors , 1985, Inf. Process. Lett..
[18] BeltramettiMonica,et al. The control mechanism for the Myrias parallel computer system , 1988 .
[19] Henri E. Bal,et al. Programming a distributed system using shared objects , 1993, [1993] Proceedings The 2nd International Symposium on High Performance Distributed Computing.
[20] David A. Padua,et al. Event synchronization analysis for debugging parallel programs , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[21] Matteo Frigo,et al. An analysis of dag-consistent distributed shared-memory algorithms , 1996, SPAA '96.
[22] John M. Mellor-Crummey,et al. On-the-fly detection of data races for programs with nested fork-join parallelism , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[23] Richard P. Brent,et al. The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.
[24] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.
[25] Martin C. Rinard,et al. Commutativity analysis: a new analysis framework for parallelizing compilers , 1996, PLDI '96.
[26] James R. Goodman,et al. Cache Consistency and Sequential Consistency , 1991 .
[27] Guy E. Blelloch,et al. Programming parallel algorithms , 1996, CACM.
[28] Paul Hudak,et al. Memory coherence in shared virtual memory systems , 1986, PODC '86.
[29] Dirk Grunwald. Heaps o' Stacks: Time and Space Efficient Threads Without Operating System Support , 1994 .
[30] Peter J. Denning,et al. Operating Systems Theory , 1973 .
[31] Guillermo J. Rozas,et al. Garbage Collection is Fast, but a Stack is Faster , 1994 .
[32] Brian N. Bershad,et al. The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.
[33] Ronald L. Graham,et al. Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.
[34] Matteo Frigo,et al. The weakest reasonable memory model , 1998 .
[35] Michael Burrows,et al. Eraser: a dynamic data race detector for multi-threaded programs , 1997, TOCS.
[36] Benjamin A. Dent,et al. Burroughs' B6500/B7500 stack mechanism , 1968, AFIPS '68 (Spring).
[37] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[38] V. Strassen. Gaussian elimination is not optimal , 1969 .
[39] Andrea C. Arpaci-Dusseau,et al. Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.
[40] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[41] Barton P. Miller,et al. On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions , 1990, ICPP.
[42] Rishiyur S. Nikhil,et al. Parallel Symbolic Computing in Cid , 1995, PSLS.
[43] Srinivasan Parthasarathy,et al. Cashmere-2L: software coherent shared memory on a clustered remote-write network , 1997, SOSP.
[44] Dirk Grunwald,et al. Whole-program optimization for time and space efficient threads , 1996, ASPLOS VII.
[45] David Singmaster,et al. Notes on Rubik's 'Magic Cube' , 1981 .
[46] Andrew W. Appel,et al. Empirical and Analytic Study of Stack Versus Heap Cost for Languages with Closures , 1996, J. Funct. Program..
[47] Charles E. McDowell,et al. Analyzing Traces with Anonymous Synchronization , 1989, ICPP.
[48] Vivek Sarkar,et al. Location Consistency: Stepping Beyond the Memory Coherence Barrier , 1995, ICPP.
[49] Brian N. Bershad,et al. Software write detection for a distributed shared memory , 1994, OSDI '94.
[50] Jeffrey S. Chase,et al. The Amber system: parallel programming on a network of multiprocessors , 1989, SOSP '89.
[51] Richard F. Barrett,et al. Matrix Market: a web resource for test matrix collections , 1996, Quality of Numerical Software.
[52] Willy Zwaenepoel,et al. Implementation and performance of Munin , 1991, SOSP '91.
[53] Seth Copen Goldstein,et al. Lazy Threads: Implementing a Fast Parallel Call , 1996, J. Parallel Distributed Comput..
[54] James R. Larus,et al. LCM: memory system support for parallel language implementation , 1994, ASPLOS VI.
[55] Jong-Deok Choi,et al. A Mechanism for Efficient Debugging of Parallel Programs , 1988, PLDI.
[56] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[57] Victor Luchangco,et al. Computation-centric memory models , 1998, SPAA '98.
[58] Anoop Gupta,et al. The Stanford FLASH multiprocessor , 1994, ISCA '94.
[59] Jong-Deok Choi,et al. An efficient cache-based access anomaly detection scheme , 1991, ASPLOS IV.
[60] Gregory R. Andrews,et al. Distributed filaments: efficient fine-grain parallelism on a cluster of workstations , 1994, OSDI '94.
[61] Monica S. Lam,et al. Jade: a high-level, machine-independent language for parallel programming , 1993, Computer.
[62] M. Hill,et al. Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[63] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[64] Edsger W. Dijkstra,et al. Solution of a problem in concurrent programming control , 1965, CACM.
[65] G. Andrew Boughton. Arctic Routing Chip , 1994, PCRCW.
[66] Monica Beltrametti,et al. The control mechanism for the Myrias parallel computer system , 1988, CARN.
[67] Mark D. Hill,et al. Weak ordering—a new definition , 1998, ISCA '98.
[68] Robert C. Miller,et al. A type-checking preprocessor for Cilk 2, a multithreaded C language , 1995 .
[69] James C. Hoe. StarT-X - A One-Man-Year Exercise in Network Interface Engineering , 1998 .
[70] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[71] Peter J. Keleher,et al. Online data-race detection via coherency guarantees , 1996, OSDI '96.
[72] Richard C. Holt,et al. Some deadlock properties of computer systems , 1971, SOSP '71.
[73] Christopher F. Joerg,et al. The Cilk system for parallel multithreaded computing , 1996 .
[74] C. Greg Plaxton,et al. Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.
[75] Edith Schonberg,et al. An empirical comparison of monitoring algorithms for access anomaly detection , 2011, PPOPP '90.
[76] Matteo Frigo,et al. DAG-consistent distributed shared memory , 1996, Proceedings of International Conference on Parallel Processing.
[77] Anant Agarwal,et al. Software-extended coherent shared memory: performance and cost , 1994, ISCA '94.
[78] Richard M. Karp,et al. Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.
[79] Robert E. Tarjan,et al. Applications of Path Compression on Balanced Trees , 1979, JACM.
[80] Victor Luchangco,et al. Precedence-Based Memory Models , 1997, WDAG.
[81] Piyush Mehrotra,et al. The BLAZE language: A parallel language for scientific programming , 1987, Parallel Comput..
[82] A. Agarwal,et al. Software-extended coherent shared memory: performance and cost , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[83] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[84] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[85] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.
[86] James R. Larus,et al. Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.