Trace-Based Dynamic Binary Parallelization
暂无分享,去创建一个
[1] Zhao Zhang,et al. Software thermal management of dram memory for multicore systems , 2008, SIGMETRICS '08.
[2] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[3] Keith D. Cooper,et al. An Experimental Evaluation of List Scheduling , 1998 .
[4] Quinn Jacobson,et al. Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[5] Guang R. Gao,et al. Identifying loops using DJ graphs , 1996, TOPL.
[6] Jing Yang,et al. Dimension: an instrumentation tool for virtual execution environments , 2006, VEE '06.
[7] Antonia Zhai,et al. Improving value communication for thread-level speculation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[8] Rajeev Barua,et al. Automatic Parallelization in a Binary Rewriter , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[9] David I. August,et al. Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[10] Teresa H. Meng,et al. Embracing heterogeneity: parallel programming for changing hardware , 2009 .
[11] Gary A. Kildall,et al. A unified approach to global program optimization , 1973, POPL.
[12] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[13] Sanjay J. Patel,et al. rePLay: A Hardware Framework for Dynamic Optimization , 2001, IEEE Trans. Computers.
[14] Nathan Clark. Why Should I Rewrite My Software When Dynamic Compilation Can Be Good Enough ? , 2008 .
[15] Vasanth Bala,et al. Dynamo: a transparent dynamic optimization system , 2000, SIGP.
[16] Jason Mars,et al. MATS : Multicore Adaptive Trace Selection , 2008 .
[17] Rajiv Gupta,et al. Copy or Discard execution model for speculative parallelization on multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[18] Babak Falsafi,et al. Flexible Hardware Acceleration for Instruction-Grain Program Monitoring , 2008, 2008 International Symposium on Computer Architecture.
[19] Wen-mei W. Hwu,et al. Automatic Discovery of Coarse-Grained Parallelism in Media Applications , 2007, Trans. High Perform. Embed. Archit. Compil..
[20] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[21] Scott A. Mahlke,et al. Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[22] Gregory J. Chaitin,et al. Register allocation and spilling via graph coloring , 2004, SIGP.
[23] Cheng Wang,et al. Selective Runtime Memory Disambiguation in a Dynamic Binary Translator , 2006, CC.
[24] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[25] Kevin Skadron,et al. Federation: Repurposing scalar cores for out-of-order instruction issue , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[26] Samuel T. King,et al. Debugging Operating Systems with Time-Traveling Virtual Machines (Awarded General Track Best Paper Award!) , 2005, USENIX Annual Technical Conference, General Track.
[27] Scott A. Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.
[28] Gurindar S. Sohi,et al. Speculative Multithreaded Processors , 2001, Computer.
[29] Michael D. Smith,et al. Generational Cache Management of Code Traces in Dynamic Optimization Systems , 2003, MICRO.
[30] Dirk Grunwald,et al. Instruction fetch mechanisms for multipath execution processors , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[31] Wei Hu,et al. Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems , 2007, CGO.
[32] Rohit Chandra,et al. Parallel programming in openMP , 2000 .
[33] Richard Johnson,et al. The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[34] Urs Hölzle,et al. High-efficiency power supplies for home computers and servers , 2006 .
[35] Sriram Sankaranarayanan,et al. Integrating ICP and LRA solvers for deciding nonlinear real arithmetic problems , 2010, Formal Methods in Computer Aided Design.
[36] Michael Franz,et al. Dynamic parallelization and mapping of binary executables on hierarchical platforms , 2006, CF '06.
[37] Weifeng Zhang,et al. An event-driven multithreaded dynamic optimization framework , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[38] John R. Ellis,et al. Bulldog: A Compiler for VLIW Architectures , 1986 .
[39] Erik R. Altman,et al. Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[40] Rastislav Bodík,et al. Path-sensitive value-flow analysis , 1998, POPL '98.
[41] Michael Gschwind. The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor , 2007, International Journal of Parallel Programming.
[42] Tarek S. Abdelrahman,et al. The use of hardware transactional memory for the trace-based parallelization of recursive Java programs , 2009, PPPJ '09.
[43] Tarek S. Abdelrahman,et al. Automatic Trace-Based Parallelization of Java Programs , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).
[44] Martin Burtscher,et al. VPC3: a fast and effective trace-compression algorithm , 2004, SIGMETRICS '04/Performance '04.
[45] A. Moffat,et al. Offline dictionary-based compression , 2000, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).
[46] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[47] Qin Zhao,et al. Pipa: pipelined profiling and analysis on multi-core systems , 2008, CGO 2008.
[48] Andreas Podelski,et al. Thread-Modular Counterexample-Guided Abstraction Refinement , 2010, SAS.
[49] Tarek S. Abdelrahman,et al. The potential of trace-level parallelism in Java programs , 2007, PPPJ.
[50] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[51] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[52] Mark Stephenson,et al. Convergent scheduling , 2002, MICRO 35.
[53] C MowryTodd,et al. Flexible Hardware Acceleration for Instruction-Grain Program Monitoring , 2008 .
[54] Westley Weimer,et al. The road not taken: Estimating path execution frequency statically , 2009, 2009 IEEE 31st International Conference on Software Engineering.
[55] Sanjay J. Patel,et al. The Performance Potential of Trace-based Dynamic Optimization , 2004 .
[56] James E. Smith,et al. Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[57] Seon Wook Kim,et al. Runtime parallelization of legacy code on a transactional memory system , 2011, HiPEAC.
[58] Rudolf Eigenmann,et al. Min-cut program decomposition for thread-level speculation , 2004, PLDI '04.
[59] Kunle Olukotun,et al. Runtime automatic speculative parallelization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[60] Apala Guha,et al. Balancing memory and performance through selective flushing of software code caches , 2010, CASES '10.
[61] Wei Liu,et al. Thread-Level Speculation on a CMP can be energy efficient , 2005, ICS '05.
[62] Antonia Zhai,et al. A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[63] Scott A. Mahlke,et al. Compiler-managed partitioned data caches for low power , 2007, LCTES '07.
[64] Guilherme Ottoni,et al. Global Multi-Threaded Instruction Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[65] Lingjia Tang,et al. Directly characterizing cross core interference through contention synthesis , 2011, HiPEAC.
[66] Gary S. Tyson,et al. Region-based caching: an energy-delay efficient memory architecture for embedded processors , 2000, CASES '00.
[67] K. Ebcioglu,et al. Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[68] J. Larus. Whole program paths , 1999, PLDI '99.
[69] Anant Agarwal,et al. Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[70] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[71] Ishfaq Ahmad,et al. Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..
[72] Onur Mutlu,et al. Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.
[73] Stefania Perri,et al. Fast Low-Cost Implementation of Single-Clock-Cycle Binary Comparator , 2008, IEEE Transactions on Circuits and Systems II: Express Briefs.
[74] Chen Ding,et al. Software behavior oriented parallelization , 2007, PLDI '07.
[75] Chi-Keung Luk,et al. Memory disambiguation for general-purpose applications , 1995, CASCON.
[76] Richard Johnson,et al. The Transmeta Code Morphing#8482; Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, CGO.
[77] Rajeev Balasubramonian,et al. Towards scalable, energy-efficient, bus-based on-chip networks , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[78] Easwaran Raman,et al. Parallel-stage decoupled software pipelining , 2008, CGO '08.
[79] Scott A. Mahlke,et al. The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.
[80] Scott A. Mahlke,et al. Uncovering hidden loop level parallelism in sequential applications , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[81] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[82] Mikko H. Lipasti,et al. Value locality and load value prediction , 1996, ASPLOS VII.
[83] Wei Liu,et al. POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.
[84] Manoj Franklin,et al. A general compiler framework for speculative multithreading , 2002, SPAA '02.
[85] Scott A. Mahlke,et al. Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[86] Anshuman Dasgupta. Vizer: A framework to analyze and vectorize Intel x86 binaries , 2003 .
[87] Margaret Martonosi,et al. Multipath execution: opportunities and limits , 1998, ICS '98.
[88] Thomas A. Henzinger,et al. The Blast Query Language for Software Verification , 2004, SAS.
[89] Greg Grohoski. Niagara-2: A highly threaded server-on-a-chip , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).
[90] Derek Bruening,et al. Secure Execution via Program Shepherding , 2002, USENIX Security Symposium.
[91] Diego R. Llanos Ferraris,et al. Just-In-Time Scheduling for Loop-based Speculative Parallelization , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).
[92] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[93] Min Xu,et al. A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.
[94] Rajeev Barua,et al. An optimal memory allocation scheme for scratch-pad-based embedded systems , 2002, TECS.
[95] Kevin Skadron,et al. Characterizing and removing branch mispredictions , 1999 .
[96] Jack W. Davidson,et al. Evaluating fragment construction policies for SDT systems , 2006, VEE '06.
[97] Weng-Fai Wong,et al. Cooperative Instruction Scheduling with Linear Scan Register Allocation , 2005, HiPC.
[98] Brad Calder,et al. Threaded multiple path execution , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[99] James E. Smith,et al. The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[100] Michael J. Quinn,et al. Parallel programming in C with MPI and OpenMP , 2003 .
[101] Antonio González,et al. Clustered speculative multithreaded processors , 1999, ICS '99.
[102] Philippe Clauss,et al. Polyhedral parallelization of binary code , 2012, TACO.
[103] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.
[104] Wen-mei W. Hwu,et al. A hardware mechanism for dynamic extraction and relayout of program hot spots , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[105] Norman P. Jouppi,et al. Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .
[106] Engin Ipek,et al. Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.
[107] Easwaran Raman,et al. Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[108] William J. Dally,et al. Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[109] Nancy M. Amato,et al. STAPL: standard template adaptive parallel library , 2010, SYSTOR '10.
[110] Xiangyu Zhang,et al. Whole Execution Traces , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[111] Malay K. Ganai,et al. Efficient decision procedure for non-linear arithmetic constraints using CORDIC , 2009, 2009 Formal Methods in Computer-Aided Design.
[112] Sanjay J. Patel,et al. Increasing the size of atomic instruction blocks using control flow assertions , 2000, MICRO 33.
[113] Edward T. Grochowski,et al. Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[114] Sorin Lerner,et al. ESP: path-sensitive program verification in polynomial time , 2002, PLDI '02.
[115] Scott A. Mahlke,et al. Superblock formation using static program analysis , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.
[116] Alan Jay Smith,et al. Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.
[117] Derek Bruening,et al. An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[118] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[119] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[120] Nathan Clark,et al. Commutativity analysis for software parallelization: letting program transformations see the big picture , 2009, ASPLOS.
[121] Keith D. Cooper,et al. Coloring register pairs , 1992, LOPL.
[122] Rakesh Ranjan,et al. Fg-STP: Fine-Grain Single Thread Partitioning on Multicores , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[123] Takashi Yokota,et al. Preliminary evaluation of a binary translation system for multithreaded processors , 2002, International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems.
[124] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[125] Kwong-Sak Leung,et al. CPE: a parallel library for financial engineering applications , 2005, Computer.
[126] Peter Marwedel,et al. Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).
[127] Wei Liu,et al. Dynamic parallelization of single-threaded binary programs using speculative slicing , 2009, ICS.
[128] Glenn Reinman,et al. Selective value prediction , 1999, ISCA.
[129] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.
[130] Jack W. Davidson,et al. Secure and practical defense against code-injection attacks using software dynamic translation , 2006, VEE '06.
[131] Gang Chen,et al. Effective instruction scheduling with limited registers , 2001 .