Optimal versus Heuristic Global Code Scheduling
暂无分享,去创建一个
[1] William J. Dally,et al. Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.
[2] Sebastian Winkel,et al. ILP-based Instruction Scheduling for IA-64 , 2001 .
[3] Josep Torrellas,et al. A Chip-Multiprocessor Architecture with Speculative Multithreading , 1999, IEEE Trans. Computers.
[4] Norman P. Jouppi,et al. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.
[5] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[6] Scott A. Mahlke,et al. Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[7] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.
[8] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[9] Michael C. Huang,et al. Dynamically Tuning Processor Resources with Adaptive Processing , 2003, Computer.
[10] Kunle Olukotun,et al. The Stanford Hydra CMP , 2000, IEEE Micro.
[11] Kishore N. Menezes,et al. Wavefront scheduling: path based data representation and scheduling of subgraphs , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[12] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[13] Kent Wilken,et al. Optimal instruction scheduling using integer programming , 2000, PLDI.
[14] A. Snavely,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.
[15] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .
[16] Dharma P. Agrawal,et al. Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.
[17] B. Ramakrishna Rau,et al. EPIC: An Architecture for Instruction-Level Parallel Processors , 2000 .
[18] Antonio González,et al. Energy-effective issue logic , 2001, ISCA 2001.
[19] Anant Agarwal,et al. Versatility and VersaBench: A New Metric and a Benchmark Suite for Flexible Architectures , 2004 .
[20] S. Winkel. Optimal global instruction scheduling for the Itanium processor architecture , 2004 .
[21] Toshihide Ibaraki,et al. Resource allocation problems - algorithmic approaches , 1988, MIT Press series in the foundations of computing.
[22] Simha Sethumadhavan,et al. Late-binding: enabling unordered load-store queues , 2007, ISCA '07.
[23] Balaram Sinharoy,et al. POWER5 system microarchitecture , 2005, IBM J. Res. Dev..
[24] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[25] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[26] William J. Dally,et al. Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.
[27] Anant Agarwal,et al. Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[28] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[29] Soo-Mook Moon,et al. Parallelizing nonnumerical code with selective scheduling and software pipelining , 1997, TOPL.
[30] Laurence A. Wolsey,et al. Integer and Combinatorial Optimization , 1988 .
[31] Norman P. Jouppi,et al. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[32] Huiyang Zhou,et al. Tree Traversal Scheduling: A Global Instruction Scheduling Technique for VLIW/EPIC Processors , 2001, LCPC.
[33] Norman P. Jouppi,et al. Conjoined-Core Chip Multiprocessing , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[34] José González,et al. Back-end assignment schemes for clustered multithreaded processors , 2004, ICS '04.
[35] Doug Burger,et al. Implementation and Evaluation of On-Chip Network Architectures , 2006, 2006 International Conference on Computer Design.
[36] BurgerDoug,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002 .
[37] Rakesh Krishnaiyer,et al. An Overview of the Intel® IA-64 Compiler , 1999 .
[38] SankaralingamKarthikeyan,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003 .
[39] David H. Albonesi,et al. Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[40] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.
[41] William J. Dally. Virtual-Channel Flow Control , 1992, IEEE Trans. Parallel Distributed Syst..
[42] Scott Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.
[43] Robert A. van de Geijn,et al. High performance dense linear algebra on a spatially distributed processor , 2008, PPoPP.
[44] Engin Ipek,et al. Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.
[45] Brad Calder,et al. Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[46] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[47] Michael Rodeh,et al. Global instruction scheduling for superscalar machines , 1991, PLDI '91.
[48] Scott A. Mahlke,et al. Characterizing the impact of predicated execution on branch prediction , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.
[49] Laurence A. Wolsey,et al. Integer and Combinatorial Optimization , 1988, Wiley interscience series in discrete mathematics and optimization.
[50] Daniel Kästner. PROPAN: A Retargetable System for Postpass Optimisations and Analyses , 2000, LCTES.
[51] Gürhan Küçük,et al. Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources , 2001, MICRO.
[52] Niraj K. Jha,et al. Express virtual channels: towards the ideal interconnection fabric , 2007, ISCA '07.
[53] Jack J. Dongarra,et al. A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[54] Chita R. Das,et al. ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[55] Simha Sethumadhavan,et al. Distributed Microarchitectural Protocols in the TRIPS Prototype Processor , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[56] William J. Dally,et al. Microarchitecture of a High-Radix Router , 2005, ISCA 2005.
[57] Sebastian Winkel,et al. Exploring the performance potential of Itanium/spl reg/ processors with ILP-based scheduling , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[58] Kunle Olukotun,et al. A Scalable, Non-blocking Approach to Transactional Memory , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[59] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[60] Srilatha Manne,et al. Power and energy reduction via pipeline balancing , 2001, ISCA 2001.
[61] Lizy Kurian John,et al. Scaling to the end of silicon with EDGE architectures , 2004, Computer.
[62] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[63] Yale N. Patt,et al. Partitioned first-level cache design for clustered microarchitectures , 2003, ICS '03.
[64] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[65] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[66] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.