Software thread integration for instruction-level parallelism
暂无分享,去创建一个
Won So | Alexander G. Dean | A. Dean | Won So
[1] Alexander G. Dean. Compiling for fine-grain concurrency: planning and performing software thread integration , 2002, Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures.
[2] Ken Kennedy,et al. Parallel Programming Support in ParaScope , 1988, Parallel Computing in Science and Engineering.
[3] Alan E. Charlesworth,et al. An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family , 1981, Computer.
[4] Thomas Way,et al. Using Path Spectra to Direct Function Cloning , 1998 .
[5] Bennett B. Goldberg,et al. Trimaran - An Infrastructure for Compiler Research in Instruction Level Parallelism , 1998 .
[6] Ken Kennedy,et al. A Methodology for Procedure Cloning , 1993, Computer languages.
[7] Monica S. Lam,et al. Interprocedural Analysis for Parallelization , 1995, LCPC.
[8] Junqiang Sun,et al. Tms320c6000 cpu and instruction set reference guide , 2000 .
[9] Huiyang Zhou,et al. Code size efficiency in global scheduling for ILP processors , 2002, Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures.
[10] Steve Johnson,et al. Compiling C for vectorization, parallelization, and inline expansion , 1988, PLDI '88.
[11] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1987, TOPL.
[12] William Thies,et al. Phased scheduling of stream programs , 2003, LCTES '03.
[13] Wen-mei W. Hwu,et al. Applying Scalable Interprocedural Pointer Analysis to Embedded Applications , 2004 .
[14] Z. Greenfield,et al. The TigerSHARC DSP Architecture , 2000, IEEE Micro.
[15] Yunheung Paek,et al. Parallel Programming with Polaris , 1996, Computer.
[16] Scott Mahlke,et al. Three Superblock Scheduling Models for Superscalar and Superpipelined Processors , 1991 .
[17] Philip H. Sweany,et al. Loop fusion for clustered VLIW architectures , 2002, LCTES/SCOPES '02.
[18] David Mosberger,et al. IA-64 Linux Kernel: Design and Implementation , 2002 .
[19] Margarida F. Jacome,et al. Compiler-directed ILP extraction for clustered VLIW/EPIC machines: predication, speculation and modulo scheduling , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.
[20] Won So,et al. Complementing software pipelining with software thread integration , 2005, LCTES '05.
[21] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.
[22] Corinna G. Lee,et al. Software pipelining loops with conditional branches , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[23] Henry Hoffmann,et al. StreamIt: A Compiler for Streaming Applications ⁄ , 2002 .
[24] John Paul Shen,et al. Techniques for software thread integration in real-time embedded systems , 1998, Proceedings 19th IEEE Real-Time Systems Symposium (Cat. No.98CB36279).
[25] Scott A. Mahlke,et al. Reverse If-Conversion , 1993, PLDI '93.
[26] Ken Kennedy,et al. Parascope:a Parallel Programming Environment , 1988 .
[27] David Grove,et al. Selective specialization for object-oriented languages , 1995, PLDI '95.
[28] Won So,et al. Procedure cloning and integration for converting parallelism from coarse to fine grain , 2003, Seventh Workshop on Interaction Between Compilers and Computer Architectures, 2003. INTERACT-7 2003. Proceedings..
[29] Miodrag Potkonjak,et al. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[30] Wen-mei W. Hwu,et al. Modulo scheduling of loops in control-intensive non-numeric programs , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[31] Alexander Aiken,et al. Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.
[32] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[33] J. Janardhan,et al. Enhanced region scheduling on a program dependence graph , 1992, MICRO 25.
[34] William Thies,et al. Linear analysis and optimization of stream programs , 2003, PLDI '03.
[35] Milind Girkar,et al. Parafrase-2: an Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors , 1989, Int. J. High Speed Comput..
[36] Ernst L. Leiss,et al. Modulo scheduling for the TMS320C6x VLIW DSP architecture , 1999, LCTES '99.
[37] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[38] Wen-mei W. Hwu,et al. Inline function expansion for compiling C programs , 1989, PLDI '89.
[39] Jack W. Davidson,et al. Subprogram Inlining: A Study of its Effects on Program Execution Time , 1992, IEEE Trans. Software Eng..
[40] Thomas Way,et al. Demand-driven Inlining Heuristics in a Region-based Optimizing Compiler for ILP Architectures , 2001 .
[41] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.
[42] Ken Kennedy,et al. Improving register allocation for subscripted variables , 1990, PLDI '90.
[43] Kathryn S. McKinley,et al. Compiling for Heterogeneous System: A Survey and an Approach , 1995 .
[44] Paul Le Guernic,et al. SIGNAL: A declarative language for synchronous programming of real-time systems , 1987, FPCA.
[45] Ken Kennedy,et al. Conversion of control dependence to data dependence , 1983, POPL '83.
[46] Gérard Berry,et al. The Esterel Synchronous Programming Language: Design, Semantics, Implementation , 1992, Sci. Comput. Program..
[47] B. Ramakrishna Rau,et al. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.
[48] Y. Hu,et al. Last revision: 8/25/03 Programmable Digital Signal Processor (PDSP): A Survey , 2003 .
[49] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).
[50] K. Yelick,et al. Generating Permutation Instructions from a High-Level Description , 2004 .
[51] A. Aiken,et al. Loop Quantization: an Analysis and Algorithm , 1987 .
[52] Steve Carr,et al. Unroll-and-jam using uniformly generated sets , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[53] Krishna Subramanian,et al. Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 25.
[54] Henry Hoffmann,et al. A stream compiler for communication-exposed architectures , 2002, ASPLOS X.
[55] Jian Wang,et al. GURPR—a method for global software pipelining , 1987, MICRO 20.
[56] Bede Liu,et al. Understanding multimedia application characteristics for designing programmable media processors , 1998, Electronic Imaging.
[57] Ken Kennedy,et al. Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..
[58] Thomas M. Conte,et al. Treegion scheduling for wide issue processors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[59] Alexander G. Dean,et al. Software thread integration for hardware to software migration , 2000 .
[60] Albert Cohen,et al. Deep jam: conversion of coarse-grain parallelism to instruction-level and vector parallelism for irregular applications , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[61] Saurabh Sharma,et al. Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors , 2001, HiPC.
[62] Won So,et al. Reaching fast code faster: using modeling for efficient software thread integration on a VLIW DSP , 2006, CASES '06.
[63] Todd A. Proebsting,et al. Filter fusion , 1996, POPL '96.
[64] Scott Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.
[65] B. Ramakrishna Rau,et al. Efficient code generation for horizontal architectures: Compiler techniques and architectural support , 1982, ISCA '82.
[66] Philip H. Sweany,et al. Optimizing loop performance for clustered VLIW architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[67] Jian Huang,et al. The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.
[68] John M. Mellor-Crummey,et al. FIAT: A Framework for Interprocedural Analysis and Transfomation , 1993, LCPC.
[69] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[70] Richard A. Huff,et al. Lifetime-sensitive modulo scheduling , 1993, PLDI '93.
[71] Robert Stephens,et al. A survey of stream processing , 1997, Acta Informatica.
[72] L. Almagor,et al. Finding effective compilation sequences , 2004, LCTES '04.
[73] Scott A. Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.
[74] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .
[75] Philip H. Sweany,et al. Improving software pipelining with unroll-and-jam , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.
[76] Andrew Wolfe,et al. A variable instruction stream extension to the VLIW architecture , 1991, ASPLOS IV.
[77] Rainer Leupers,et al. Function inlining under code size constraints for embedded processors , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).
[78] Emden R. Gansner,et al. Drawing graphs with dot , 2006 .
[79] Chris J. Newburn,et al. EXPLOITING MULTI-GRAINED PARALLELISM FOR MULTIPLE-INSTRUCTION-STREAM ARCHITECTURES , 1997 .
[80] William J. Dally,et al. Imagine: Media Processing with Streams , 2001, IEEE Micro.
[81] Krishna Subramanian,et al. Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 1992.
[82] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[83] Manoj Franklin,et al. The multiscalar architecture , 1993 .
[84] Lex Augusteijn,et al. Instruction Scheduling for TriMedia , 1999, J. Instr. Level Parallelism.
[85] Henry Hoffmann,et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.
[86] Mary Hall. Managing interprocedural optimization , 1992 .
[87] Pascal Raymond,et al. The synchronous data flow programming language LUSTRE , 1991, Proc. IEEE.
[88] Guilherme Ottoni,et al. From sequential programs to concurrent threads , 2006, IEEE Computer Architecture Letters.
[89] Vicki H. Allan,et al. Enhanced region scheduling on a program dependence graph , 1992, MICRO 1992.
[90] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2004, The Journal of Supercomputing.
[91] Scott A. Mahlke,et al. The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.
[92] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[93] Alexander G. Dean,et al. Compiling for fine-grain concurrency: planning and performing software thread integration , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..
[94] Michael Hind,et al. Which pointer analysis should I use? , 2000, ISSTA '00.
[95] Geoffrey Brown,et al. Lx: a technology platform for customizable VLIW embedded processing , 2000, ISCA '00.
[96] Siddhartha Shivshankar,et al. Asynchronous software thread integration for efficient software implementations of embedded communication protocol controllers , 2004 .
[97] Monica S. Lam,et al. Interprocedural parallelization analysis in SUIF , 2005, TOPL.
[98] Rajiv Gupta,et al. Region Scheduling: An Approach for Detecting and Redistributing Parallelism , 1990, IEEE Trans. Software Eng..