Search Space Properties for Mapping Coarse-Grain Pipelined FPGA Applications
暂无分享,去创建一个
[1] R. Govindarajan,et al. A Vectorizing Compiler for Multimedia Extensions , 2000, International Journal of Parallel Programming.
[2] Mark Horowitz,et al. Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.
[3] Seth J. Teller,et al. The cricket compass for context-aware mobile applications , 2001, MobiCom '01.
[4] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[5] David B. Whalley,et al. Efficient and effective branch reordering using profile data , 2002, TOPL.
[6] Ken Kennedy,et al. Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries , 2001, J. Parallel Distributed Comput..
[7] David E. Culler,et al. System architecture directions for networked sensors , 2000, SIGP.
[8] R.K. Brunner,et al. Adapting to load on workstation clusters , 1999, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.
[9] Margaret H. Dunham,et al. Common Subexpression Processing in Multiple-Query Processing , 1998, IEEE Trans. Knowl. Data Eng..
[10] Keshav Pingali,et al. A case for source-level transformations in MATLAB , 1999, DSL '99.
[11] Mark N. Wegman,et al. Analysis of pointers and structures , 1990, SIGP.
[12] Umesh Kumar,et al. An Efficient Algorithm to Compute Delay Set in SPMD Programs , 2003, HiPC.
[13] Michael Voss,et al. Portable Compilers for OpenMP , 2001, WOMPAT.
[14] Keith D. Cooper,et al. Register promotion in C programs , 1997, PLDI '97.
[15] Etienne Morel,et al. Global optimization by suppression of partial redundancies , 1979, CACM.
[16] Yoichi Muraoka,et al. Measurements of parallelism in ordinary FORTRAN programs , 1974, Computer.
[17] Zhiyuan Li,et al. An Interprocedural Parallelizing Compiler and Its Support for Memory Hierarchy Research , 1995, LCPC.
[18] Guy E. Blelloch,et al. NESL: A Nested Data-Parallel Language , 1992 .
[19] Karthik Gargi. A sparse algorithm for predicated global value numbering , 2002, PLDI '02.
[20] Anthony Skjellum,et al. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..
[21] Joel H. Saltz,et al. Compiler techniques for data parallel applications using very large multi-dimensional datasets , 2001 .
[22] Jeremy D. Frens,et al. Language support for Morton-order matrices , 2001, PPoPP '01.
[23] Hari Balakrishnan,et al. 6th ACM/IEEE International Conference on on Mobile Computing and Networking (ACM MOBICOM ’00) The Cricket Location-Support System , 2022 .
[24] Ronald Minnich,et al. A network-failure-tolerant message-passing system for terascale clusters , 2002, ICS '02.
[25] Lars Ole Andersen,et al. Program Analysis and Specialization for the C Programming Language , 2005 .
[26] Gurindar S. Sohi,et al. Understanding the differences between value prediction and instruction reuse , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[27] Chialin Chang,et al. Parallel aggregation on multi-dimensional scientific datasets , 2001 .
[28] Fred Weber,et al. AMD 3DNow! technology: architecture and implementations , 1999, IEEE Micro.
[29] Robert S. Gray,et al. Agent Tcl: a Exible and Secure Mobile-agent System , 1996 .
[30] Bowen Alpern,et al. A model for hierarchical memory , 1987, STOC.
[31] Paul Feautrier,et al. Improving Data Locality by Chunking , 2003, CC.
[32] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[33] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[34] Laurie J. Hendren,et al. Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C , 1996, POPL '96.
[35] Richard M. Stallman,et al. Using and Porting the GNU Compiler Collection , 2000 .
[36] Bjarne Stroustrup,et al. The Design and Evolution of C , 1994 .
[37] David Grove,et al. Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis , 1995, ECOOP.
[38] George Cybenko,et al. D'Agents: Applications and performance of a mobile‐agent system , 2002, Softw. Pract. Exp..
[39] Takeo Kanade,et al. Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..
[40] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[41] Joel H. Saltz,et al. Efficient Execution of Multi-query Data Analysis Batches Using Compiler Optimization Strategies , 2003, LCPC.
[42] Liviu Iftode,et al. Spatial programming with smart messages for networks of embedded systems , 2002 .
[43] Marc Tremblay,et al. The visual instruction set (VIS) in UltraSPARC , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.
[44] Ken Kennedy,et al. Reducing and Vectorizing Procedures for Telescoping Languages , 2004, International Journal of Parallel Programming.
[45] Jack W. Davidson,et al. A study of a C function inliner , 1988, Softw. Pract. Exp..
[46] Jason Maassen,et al. Object-based collective communication in Java , 2001, JGI '01.
[47] Mark J. Clement,et al. DOGMA: Distributed Object Group Management Architecture , 1998 .
[48] Sungdo Moon,et al. Evaluation of predicated array data-flow analysis for automatic parallelization , 1999, PPoPP '99.
[49] I. A. Getting,et al. The Global Positioning System , 1993 .
[50] Andrea C. Arpaci-Dusseau,et al. Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.
[51] Roy Friedman,et al. Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).
[52] Youfeng Wu,et al. Comprehensive Redundant Load Elimination for the IA-64 Architecture , 1999, LCPC.
[53] Pedro C. Diniz,et al. Coarse-grain pipelining on multiple FPGA architectures , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[54] Paul Feautrier,et al. Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.
[55] Paramvir Bahl,et al. RADAR: an in-building RF-based user location and tracking system , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).
[56] Deborah Estrin,et al. Directed diffusion: a scalable and robust communication paradigm for sensor networks , 2000, MobiCom '00.
[57] Charles N. Fischer,et al. Crafting a Compiler , 1988 .
[58] Harrick M. Vin,et al. Egida: an extensible toolkit for low-overhead fault-tolerance , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[59] Jeffrey Scott Vitter,et al. Algorithms for parallel memory, II: Hierarchical multilevel memories , 1992, Algorithmica.
[60] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.
[61] Michael Stonebraker,et al. The SEQUOIA 2000 Project , 1993, SSD.
[62] P. Feautrier. Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .
[63] Lawrence Rauchwerger,et al. Standard Templates Adaptive Parallel Library (STAPL) , 1998, LCR.
[64] Scott A. Mahlke,et al. High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.
[65] Pedro C. Diniz,et al. Bridging the Gap between Compilation and Synthesis in the DEFACTO System , 2001, LCPC.
[66] David A. Padua,et al. Monotonic evolution: an alternative to induction variable substitution for dependence analysis , 2001, ICS '01.
[67] Andreas Krall,et al. Compilation Techniques for Multimedia Processors , 2004, International Journal of Parallel Programming.
[68] Vivek Sarkar,et al. Array SSA form and its use in parallelization , 1998, POPL '98.
[69] Jason Maassen,et al. Ibis: an efficient Java-based grid programming environment , 2002, JGI '02.
[70] David A. Padua,et al. Advanced compiler optimizations for supercomputers , 1986, CACM.
[71] Jong-Deok Choi,et al. Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects , 1993, POPL '93.
[72] D. Michie. “Memo” Functions and Machine Learning , 1968, Nature.
[73] Russell W. Quong,et al. ANTLR: A predicated‐LL(k) parser generator , 1995, Softw. Pract. Exp..
[74] Pedro C. Diniz,et al. A compiler approach to fast hardware design space exploration in FPGA-based systems , 2002, PLDI '02.
[75] Andrew Ayers,et al. Aggressive inlining , 1997, PLDI '97.
[76] Mark N. Wegman,et al. Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.
[77] Geoffrey C. Fox,et al. MpiJava: A Java Interface to MPI , 1998 .
[78] Adam J. Ferrari. JPVM: Network Parallel Computing in Java , 1997 .
[79] Alexandru Nicolau,et al. Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance , 2001, Algorithm Engineering.
[80] Saman Amarasinghe,et al. Parallelizing Compiler Techniques Based on Linear Inequalities , 1997 .
[81] Raymond Lo,et al. Register promotion by sparse partial redundancy elimination of loads and stores , 1998, PLDI.
[82] Linda G. DeMichiel,et al. Extending Relational Database Technology for New Applications , 1994, IBM Syst. J..
[83] William Pugh,et al. Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.
[84] Masaru Tomita,et al. Efficient parsing for natural language , 1985 .
[85] Rudolf Eigenmann,et al. Supporting Realistic OpenMP Applications on a Commodity Cluster of Workstations , 2003, WOMPAT.
[86] Greg Burns,et al. LAM: An Open Cluster Environment for MPI , 2002 .
[87] Rajiv Gupta,et al. Load-reuse analysis: design and evaluation , 1999, PLDI '99.
[88] Katherine A. Yelick,et al. Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..
[89] Philip Levis,et al. Maté: a tiny virtual machine for sensor networks , 2002, ASPLOS X.
[90] A. Harter,et al. A distributed location system for the active office , 1994, IEEE Network.
[91] Viktor Kuncak,et al. Role analysis , 2002, POPL '02.
[92] Hongjun Lu,et al. Workload Scheduling for Multiple Query Processing , 1995, Inf. Process. Lett..
[93] N. V. Kallur,et al. A Hierarchical Data Archiving and Processing System to Generate Custom Tailored Products From AVHRR Data , 2004 .
[94] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[95] Laxmikant V. Kalé,et al. Supporting dynamic parallel object arrays , 2003, Concurr. Comput. Pract. Exp..
[96] Ulrich Kremer. Compilers for power and energy management , 2002, ISLPED '02.
[97] M. Luisa Córdoba Cabeza,et al. CacheSim: a cache simulator for teaching memory hierarchy behaviour , 1999, ITiCSE '99.
[98] Jingling Xue. Automating Non-Unimodular Loop Transformations for Massive Parallelism , 1994, Parallel Comput..
[99] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[100] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[101] Michael Philippsen,et al. A more efficient RMI for Java , 1999, JAVA '99.
[102] Corinne Ancourt,et al. Scanning polyhedra with DO loops , 1991, PPOPP '91.
[103] Neville Churcher,et al. A Generated Parser of C , 2001 .
[104] W. Kelly,et al. Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.
[105] W. Hwu,et al. Accurate and efficient predicate analysis with binary decision diagrams , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[106] Jeffrey Scott Vitter,et al. Algorithms for parallel memory, I: Two-level memories , 2005, Algorithmica.
[107] Mitsuhisa Sato,et al. Design of OpenMP Compiler for an SMP Cluster , 1999 .
[108] Henry G. Dietz,et al. Common Subexpression Induction , 1992, ICPP.
[109] Laurie J. Hendren,et al. Practical virtual method call resolution for Java , 2000, OOPSLA '00.
[110] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[111] Henry G. Dietz,et al. Compiling for SIMD Within a Register , 1998, LCPC.
[112] Gabriel Antoniu,et al. An Efficient and Transparent Thread Migration Scheme in the PM2 Runtime System , 1999, IPPS/SPDP Workshops.
[113] Kevin Skadron,et al. Power issues related to branch prediction , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[114] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[115] Zhen Fang,et al. The Impulse Memory Controller , 2001, IEEE Trans. Computers.
[116] Mary W. Hall,et al. Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[117] Saman P. Amarasinghe,et al. Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.
[118] Benoît Meister,et al. Automatic memory layout transformations to optimize spatial locality in parameterized loop nests , 2000, CARN.
[119] Daniel Marques,et al. Collective Operations in an Application-level Fault Tolerant MPI System , 2003 .
[120] Barbara G. Ryder,et al. Interprocedural modification side effect analysis with pointer aliasing , 1993, PLDI '93.
[121] Jong-Deok Choi,et al. Interprocedural pointer alias analysis , 1999, TOPL.
[122] Jeremy Manson,et al. JSR-133: Java Memory Model and Thread Specification , 2003 .
[123] Li Xu. Program redundancy analysis and optimization to improve memory performance , 2003 .
[124] Alexandru Nicolau,et al. A language for conveying the aliasing properties of dynamic, pointer-based data structures , 1994, Proceedings of 8th International Parallel Processing Symposium.
[125] Seth Copen Goldstein,et al. PipeRench: a co/processor for streaming multimedia acceleration , 1999, ISCA.
[126] David A. Carlson,et al. Multimedia extensions for a 550-MHz RISC microprocessor , 1997 .
[127] William Pugh,et al. Optimization within a unified transformation framework , 1996 .
[128] Ken Kennedy,et al. Parascope:a Parallel Programming Environment , 1988 .
[129] Laxmikant V. Kalé,et al. Emulating petaflops machines and blue gene , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[130] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[131] Pierre Boulet,et al. Loop Parallelization Algorithms: From Parallelism Extraction to Code Generation , 1998, Parallel Comput..
[132] Rudolf Eigenmann,et al. Nonlinear and Symbolic Data Dependence Testing , 1998, IEEE Trans. Parallel Distributed Syst..
[133] Constantine D. Polychronopoulos,et al. The structure of parafrase-2: an advanced parallelizing compiler for C and FORTRAN , 1990 .
[134] Kenneth E. Batcher. STARAN parallel processor system hardware , 1974, AFIPS '74.
[135] Dirk Grunwald,et al. Reducing branch costs via branch alignment , 1994, ASPLOS VI.
[136] Michael E. Wolf,et al. Improving locality and parallelism in nested loops , 1992 .
[137] R. Ferreira,et al. Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[138] Reinhard Wilhelm,et al. Solving shape-analysis problems in languages with destructive updating , 1998, TOPL.
[139] Yong Wang,et al. Energy-efficient computing for wildlife tracking: design tradeoffs and early experiences with ZebraNet , 2002, ASPLOS X.
[140] Wendi B. Heinzelman,et al. Adaptive protocols for information dissemination in wireless sensor networks , 1999, MobiCom.
[141] Hiroshi Nakamura,et al. Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.
[142] Csaba Andras Moritz,et al. Parallelizing applications into silicon , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).
[143] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .
[144] Rudolf Eigenmann,et al. The range test: a dependence test for symbolic, non-linear expressions , 1994, Proceedings of Supercomputing '94.
[145] Chris Hankin,et al. Abstract Interpretation of Declarative Languages , 1987 .
[146] Calvin Lin,et al. An annotation language for optimizing software libraries , 1999, DSL '99.
[147] Monica S. Lam,et al. Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..
[148] Luiz A. DeRose,et al. Compiler techniques for MATLAB programs , 1996 .
[149] Rinku Gupta,et al. Static analysis of parame-terized loop nests for energy e?cient use of data caches , 2001 .
[150] Bruce A. Draper,et al. The Cameron Project: High-Level Programming of Image Processing Applications on Reconfigurable Computing Machines 1 , 1998 .
[151] Rafael Asenjo,et al. Accurate Shape Analysis for Recursive Data Structures , 2000, LCPC.
[152] David A. Padua,et al. Containers on the Parallelization of General-Purpose Java Programs , 2004, International Journal of Parallel Programming.
[153] Thorsten von Eicken,et al. Interfacing Java to the virtual interface architecture , 1999, JAVA '99.
[154] Steve Johnson,et al. Compiling C for vectorization, parallelization, and inline expansion , 1988, PLDI '88.
[155] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[156] Richard M. Karp,et al. The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.
[157] T. Kurc,et al. Efficient Execution of Multiple Query Workloads in Data Analysis Applications , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[158] David A. Padua,et al. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers , 1995, ICS '95.
[159] Dennis Gannon,et al. Sage++: An Object-Oriented Toolkit and Class Library for Building Fortran and C++ Restructuring Tool , 1994 .
[160] Alexandru Nicolau,et al. A general data dependence test for dynamic, pointer-based data structures , 1994, PLDI '94.
[161] Pradeep K. Dubey,et al. How Multimedia Workloads Will Change Processor Design , 1997, Computer.
[162] Steven W. K. Tjiang,et al. SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.
[163] Jason Maassen,et al. GMI: Flexible and Efficient Group Method Invocation for Parallel Programming , 2002 .
[164] Ulrich Kremer,et al. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction , 2003, PLDI '03.
[165] Laxmikant V. Kalé,et al. A framework for collective personalized communication , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[166] Israel Koren,et al. Jmpi: Implementing The Message Passing Interface Standard In Java , 2000 .
[167] Dror Eliezer Maydan. Accurate analysis of array references , 1993 .
[168] Prasan Roy,et al. Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.
[169] Constantine D. Polychronopoulos,et al. Symbolic analysis for parallelizing compilers , 1996, TOPL.
[170] Emmett Witchel,et al. Increasing and detecting memory address congruence , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[171] Hanspeter Moessenboeck,et al. Coco/R - A Generator for Fast Compiler Front Ends , 1990 .
[172] David E. Culler,et al. The nesC language: A holistic approach to networked embedded systems , 2003, PLDI.
[173] Joel H. Saltz,et al. Compiling Data Intensive Applications with Spatial Coordinates , 2000, LCPC.
[174] James M. Rehg,et al. A Compilation Framework for Power and Energy Management on Mobile Computers , 2001, LCPC.
[175] David F. Bacon,et al. Fast static analysis of C++ virtual function calls , 1996, OOPSLA '96.
[176] David A. Padua,et al. MaJIC: compiling MATLAB for speed and responsiveness , 2002, PLDI '02.
[177] Monica S. Lam,et al. Efficient context-sensitive pointer analysis for C programs , 1995, PLDI '95.
[178] B. Ramakrishna Rau,et al. Efficient design space exploration in PICO , 2000, CASES '00.
[179] David A. Padua,et al. Techniques for the translation of MATLAB programs into Fortran 90 , 1999, TOPL.
[180] Laxmikant V. Kalé,et al. Run-Time Support for Adaptive Load Balancing , 2000, IPDPS Workshops.
[181] Pedro C. Diniz,et al. Compiler-generated communication for pipelined FPGA applications , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).
[182] Carl Ebeling,et al. Specifying and compiling applications for RaPiD , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).
[183] Monica S. Lam,et al. Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.
[184] Xin Yuan,et al. Branch Elimination via Multi-variable Condition Merging , 2003, Euro-Par.
[185] W. Jalby,et al. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.
[186] Thomas R. Gross,et al. Static conflict analysis for multi-threaded object-oriented programs , 2003, PLDI '03.
[187] Pedro C. Diniz,et al. Using estimates from behavioral synthesis tools in compiler-directed design space exploration , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).
[188] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[189] David B. Whalley,et al. Avoiding conditional branches by code replication , 1995, PLDI '95.
[190] Keith D. Cooper,et al. An efficient static analysis algorithm to detect redundant memory operations , 2002, MSP/ISMM.
[191] Miron Livny,et al. Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .
[192] Keith D. Cooper,et al. Value-driven redundancy elimination , 1996 .
[193] Diego Puppin. Convergent scheduling : a flexible and extensible scheduling framework for clustered VLIW architectures , 2002 .
[194] Xavier Martorell,et al. NanosCompiler: A Research Platform for OpenMP Extensions , 1999 .
[195] Reinhard Wilhelm,et al. Parametric shape analysis via 3-valued logic , 1999, POPL '99.
[196] Sava Mintchev. Writing Programs in JavaMPI , 1997 .
[197] Chau-Wen Tseng,et al. Compiler optimizations for eliminating barrier synchronization , 1995, PPOPP '95.
[198] Joel H. Saltz,et al. Exploiting functional decomposition for efficient parallel processing of multiple data analysis queries , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[199] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[200] William Pugh,et al. A practical algorithm for exact array dependence analysis , 1992, CACM.
[201] Erik Ruf,et al. Effective synchronization removal for Java , 2000, PLDI '00.
[202] Alok Aggarwal,et al. Hierarchical memory with block transfer , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[203] Mateo Valero,et al. Eliminating cache conflict misses through XOR-based placement functions , 1997, ICS '97.
[204] Ken Kennedy,et al. Automatic translation of FORTRAN programs to vector form , 1987, TOPL.
[205] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[206] Micah Beck,et al. Compiler-Assisted Checkpointing , 1994 .
[207] Mats Brorsson,et al. OdinMP/CCp - a portable implementation of OpenMP for C , 2000, Concurr. Pract. Exp..
[208] Jack Minker,et al. Multiple Query Processing in Deductive Databases using Query Graphs , 1986, VLDB.
[209] Ken Kennedy,et al. Improving register allocation for subscripted variables , 1990, SIGP.
[210] Saman Amarasinghe,et al. The suif compiler for scalable parallel machines , 1995 .
[211] Frank Pfenning,et al. Eliminating array bound checking through dependent types , 1998, PLDI.
[212] Alain Deutsch,et al. Interprocedural may-alias analysis for pointers: beyond k-limiting , 1994, PLDI '94.
[213] Anthony Skjellum,et al. A framework for high‐performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low‐level kernels , 2002, Concurr. Comput. Pract. Exp..
[214] Geoffrey C. Fox,et al. MPJ: MPI-like message passing for Java , 2000 .
[215] Isak Jonsson,et al. Recursive Blocked Data Formats and BLAS's for Dense Linear Algebra Algorithms , 1998, PARA.
[216] Mark S. Squillante,et al. Processor Allocation in Multiprogrammed Distributed-Memory Parallel Computer Systems , 1997, J. Parallel Distributed Comput..
[217] David A. Padua,et al. On the Automatic Parallelization of the Perfect Benchmarks , 1998, IEEE Trans. Parallel Distributed Syst..
[218] Joel H. Saltz,et al. Run-time and compile-time support for adaptive irregular problems , 1994, Proceedings of Supercomputing '94.
[219] Susmita Sur-Kolay,et al. Combined instruction and loop parallelism in array synthesis for FPGAs , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).
[220] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[221] Franco P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part II, Lower Bounds , 1999, Theory of Computing Systems.
[222] Samuel P. Midkiff,et al. Compiling programs with user parallelism , 1990 .
[223] Bernhard Steffen,et al. Lazy code motion , 1992, PLDI '92.
[224] Markus Schordan,et al. Parallel object‐oriented framework optimization , 2004, Concurr. Comput. Pract. Exp..
[225] David A. Padua,et al. Basic compiler algorithms for parallel programs , 1999, PPoPP '99.
[226] J. Cocke. Global common subexpression elimination , 1970, Symposium on Compiler Optimization.
[227] Keith D. Cooper,et al. Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.
[228] Roy Dz-Ching Ju,et al. A new algorithm for scalar register promotion based on SSA form , 1998, PLDI '98.
[229] Rudolf Eigenmann,et al. Polaris: A New-Generation Parallelizing Compiler for MPPs , 1993 .
[230] John Wawrzynek,et al. Adapting software pipelining for reconfigurable computing , 2000, CASES '00.
[231] Andy Hopper,et al. The Anatomy of a Context-Aware Application , 1999, Wirel. Networks.
[232] Martin Griebl,et al. Code generation in the polytope model , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[233] Larry Carter,et al. Memory hierarchy considerations for fast transpose and bit-reversals , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[234] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[235] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[236] George C. Necula,et al. The design and implementation of a certifying compiler , 1998, PLDI.
[237] Zhao Zhang,et al. Cache-Optimal Methods for Bit-Reversals , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[238] Vikram S. Adve,et al. High Performance Fortran Compilation Techniques for Parallelizing Scientific Codes , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[239] Yunheung Paek,et al. The Access Region Test , 1999, LCPC.
[240] Ken Kennedy,et al. A technique for summarizing data access and its use in parallelism enhancing transformations , 1989, PLDI '89.
[241] David E. Culler,et al. Jaguar: enabling efficient communication and I/O in Java , 2000 .
[242] Michael F. P. O'Boyle,et al. Feedback Assisted Iterative Compilation , 2000 .
[243] Rajiv Gupta,et al. Interprocedural conditional branch elimination , 1997, PLDI '97.
[244] Rudolf Eigenmann,et al. Idiom recognition in the Polaris parallelizing compiler , 1995, ICS '95.
[245] Andy Hopper,et al. Implementing a Sentient Computing System , 2001, Computer.
[246] Bharat K. Bhargava,et al. Multiple-Query Optimization at Algorithm-Level , 1994, Data Knowl. Eng..
[247] Paul H. J. Kelly,et al. An exhaustive evaluation of row-major, column-major and Morton layouts for large two-dimensional arrays , 2003 .
[248] Saman P. Amarasinghe,et al. Convergent scheduling , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[249] James R. Larus,et al. Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[250] Albert Cohen,et al. Putting Polyhedral Loop Transformations to Work , 2003, LCPC.
[251] Scott A. Mahlke,et al. Profile‐guided automatic inline expansion for C programs , 1992, Softw. Pract. Exp..
[252] Prithviraj Banerjee,et al. Static array storage optimization in MATLAB , 2003, PLDI '03.
[253] Jack W. Davidson,et al. Subprogram Inlining: A Study of its Effects on Program Execution Time , 1992, IEEE Trans. Software Eng..
[254] Wayne Luk,et al. Pipeline vectorization for reconfigurable systems , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).
[255] Utpal Banerjee,et al. Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.
[256] Toshiaki Yasue,et al. An Empirical Study of Method In-lining for a Java Just-in-Time Compiler , 2002, Java Virtual Machine Research and Technology Symposium.
[257] Siegfried Benkner. VFC: The Vienna Fortran Compiler , 1999, Sci. Program..
[258] Jim Waldo,et al. The Jini architecture for network-centric computing , 1999, CACM.
[259] Kenneth Steiglitz,et al. Testing for cycles in infinite graphs with periodic structure , 1987, STOC.
[260] Liviu Iftode,et al. Toward a security architecture for smart messages: challenges, solutions, and open issues , 2003, 23rd International Conference on Distributed Computing Systems Workshops, 2003. Proceedings..
[261] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[262] Steven G. Johnson,et al. The Fastest Fourier Transform in the West , 1997 .
[263] Dennis Gannon,et al. HPC++: experiments with the parallel standard template library , 1997, ICS '97.
[264] Daniel Marques,et al. C3: A System for Automating Application-Level Checkpointing of MPI Programs , 2003, LCPC.
[265] Markus Schordan,et al. Treating a user-defined parallel library as a domain-specific language , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[266] Michael F. P. O'Boyle,et al. MARS: A Distributed Memory Approach to Shared Memory Compilation , 1998, LCR.
[267] Sanjay V. Rajopadhye,et al. Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.
[268] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[269] Christoforos E. Kozyrakis,et al. How to solve the current memory access and data transfer bottlenecks: at the processor architecture or at the compiler level , 2000, DATE '00.
[270] Robert Scheifler,et al. An analysis of inline substitution for a structured programming language , 1977, CACM.
[271] Cheryl McCosh,et al. Type-based specialization in a telescoping compiler for Matlab , 2003 .
[272] Liviu Iftode,et al. Cooperative computing for distributed embedded systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.
[273] André Seznec,et al. A case for two-way skewed-associative caches , 1993, ISCA '93.
[274] James R. Larus,et al. Detecting conflicts between structure accesses , 1988, PLDI '88.
[275] Ken Kennedy,et al. Automatic Type-Driven Library Generation for Telescoping Languages , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[276] Thomas R. Gross,et al. Exploiting task and data parallelism on a multicomputer , 1993, PPOPP '93.
[277] Ken Kennedy,et al. Practical dependence testing , 1991, PLDI '91.
[278] Katsunobu Muroi,et al. A SIMDizing C Compiler for the Mitsubishi Electric Neuro4 Processor Array , 1996 .
[279] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[280] Edith Cohen,et al. Strongly polynomial-time and NC algorithms for detecting cycles in periodic graphs , 1993, JACM.
[281] Rajesh Gupta. Architectural adaptation in AMRM machines , 2000, Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era.
[282] David A. Padua,et al. Issues in the Optimization of Parallel Programs , 1990, ICPP.
[283] Rainer Leupers,et al. Function inlining under code size constraints for embedded processors , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).
[284] Alan Jay Smith,et al. Design and characterization of the Berkeley multimedia workload , 2002, Multimedia Systems.
[285] David Detlefs,et al. Inlining of Virtual Methods , 1999, ECOOP.
[286] Jack J. Dongarra,et al. Vectorizing compilers: a test suite and results , 1988, Proceedings. SUPERCOMPUTING '88.
[287] Mahmut T. Kandemir,et al. Influence of compiler optimizations on system power , 2000, Proceedings 37th Design Automation Conference.
[288] Aart J. C. Bik,et al. Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.
[289] Indranil Gupta,et al. On scalable and efficient distributed failure detectors , 2001, PODC '01.
[290] William Adjie-Winoto,et al. The design and implementation of an intentional naming system , 2000, OPSR.
[291] Geoffrey C. Fox,et al. Parallel Computing Works , 1994 .
[292] Andrew A. Chien,et al. Analysis of Dynamic Structures for Efficient Parallel Execution , 1993, LCPC.
[293] James R. Larus,et al. Branch prediction for free , 1993, PLDI '93.
[294] Aart J. C. Bik,et al. Automatic Detection of Saturation and Clipping Idioms , 2002, LCPC.
[295] James E. Smith,et al. A study of branch prediction strategies , 1981, ISCA '98.
[296] Manfred P. Stadel,et al. A variation of Knoop, Rüthing, and Steffen's Lazy Code Motion , 1993, SIGP.
[297] Thomas M. Conte,et al. Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[298] Apan Qasem,et al. Improving Performance with Integrated Program Transformations , 2004 .
[299] Mahmut T. Kandemir,et al. The design and use of simplePower: a cycle-accurate energy estimation tool , 2000, Proceedings 37th Design Automation Conference.
[300] Pierre Jouvelot,et al. Semantical interprocedural parallelization: an overview of the PIPS project , 1991 .
[301] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[302] Wen-mei W. Hwu,et al. Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[303] Tao Yang,et al. Program transformation and runtime support for threaded MPI execution on shared-memory machines , 2000, TOPL.
[304] Joel H. Saltz,et al. Active Proxy-G: Optimizing the Query Execution Process in the Grid , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[305] Volker Strumpen,et al. Portable Checkpointing for Heterogenous Architectures , 1997, International Symposium on Fault-Tolerant Computing.
[306] Miodrag Potkonjak,et al. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[307] Monica S. Lam,et al. Efficient and exact data dependence analysis , 1991, PLDI '91.
[308] Dirk Grunwald,et al. Reducing indirect function call overhead in C++ programs , 1994, POPL '94.
[309] Jameela Al-Jaroodi,et al. A comparative study of parallel and distributed Java projects for heterogeneous systems , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[310] David Grove,et al. Adaptive online context-sensitive inlining , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[311] M. Schlansker,et al. On Predicated Execution , 1991 .
[312] Yunheung Paek,et al. Parallel Programming with Polaris , 1996, Computer.
[313] Ken Kennedy,et al. Optimizing strategies for telescoping languages: procedure strength reduction and procedure vectorization , 2001, ICS '01.
[314] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[315] Liviu Iftode,et al. Self-routing in pervasive computing environments using smart messages , 2003, Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, 2003. (PerCom 2003)..
[316] Ruby B. Lee,et al. Mapping of application software to the multimedia instructions of general-purpose microprocessors , 1997, Electronic Imaging.