Customizing VLIW processors from dynamically profiled execution traces
暂无分享,去创建一个
[1] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[2] Marcel Gort,et al. Range and bitmask analysis for hardware optimization in high-level synthesis , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).
[3] Gustavo de Veciana,et al. Application-specific clustered VLIW datapaths: early exploration on a parameterized design space , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[4] Michael D. Smith,et al. Boosting beyond static scheduling in a superscalar processor , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[5] Gorker Alp Malazgirt,et al. Application specific multi-port memory customization in FPGAs , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).
[6] Monica S. Lam,et al. Limits of control flow on parallelism , 1992, ISCA '92.
[7] Kevin B. Theobald,et al. On the limits of program parallelism and its smoothability , 1992, MICRO 1992.
[8] William Pugh,et al. The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[9] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.
[10] Monica S. Lam,et al. Efficient and exact data dependence analysis , 1991, PLDI '91.
[11] Alexandru Nicolau,et al. Measuring the Parallelism Available for Very Long Instruction Word Architectures , 1984, IEEE Transactions on Computers.
[12] Alexandru Nicolau,et al. Using an oracle to measure potential parallelism in single instruction stream programs , 1981, MICRO 14.
[13] Alan Dain Samples,et al. Profile-Driven Compilation , 1991 .
[14] Monica S. Lam,et al. Efficient context-sensitive pointer analysis for C programs , 1995, PLDI '95.
[15] Jinian Bian,et al. Automatic enhanced CDFG generation based on runtime instrumentation , 2013, Proceedings of the 2013 IEEE 17th International Conference on Computer Supported Cooperative Work in Design (CSCWD).
[16] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[17] Chaitali Chakrabarti,et al. Multi-Module Multi-Port Memory Design for Low Power Embedded Systems , 2004, Des. Autom. Embed. Syst..
[18] Preeti Ranjan Panda,et al. Shared-port register file architecture for low-energy VLIW processors , 2014, TACO.
[19] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[20] Arquimedes Canedo,et al. Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture , 2011, The Journal of Supercomputing.
[21] Michael D. Smith,et al. Limits on multiple instruction issue , 1989, ASPLOS III.
[22] Arda Yurdakul,et al. An Efficient Heterogeneous Register File Implementation for FPGAs , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[23] Henk Corporaal,et al. Exploring processor parallelism: Estimation methods and optimization strategies , 2013, 2013 IEEE 16th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).
[24] Monica S. Lam,et al. RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .
[25] Pekka Jääskeläinen,et al. Loop Scheduling for Transport Triggered Architecture Processors , 2006, 2006 International Symposium on System-on-Chip.
[26] Gorker Alp Malazgirt,et al. MIPT: Rapid exploration and evaluation for migrating sequential algorithms to multiprocessing systems with multi-port memories , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).
[27] Bede Liu,et al. Understanding multimedia application characteristics for designing programmable media processors , 1998, Electronic Imaging.
[28] Sijung Hu,et al. BioThreads: A Novel VLIW-Based Chip Multiprocessor for Accelerating Biomedical Image Processing Applications , 2012, IEEE Transactions on Biomedical Circuits and Systems.
[29] Zvi Drezner,et al. An Efficient Genetic Algorithm for the p-Median Problem , 2003, Ann. Oper. Res..
[30] Wayne H. Wolf,et al. Data-path synthesis of VLIW video signal processors , 1998, Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210).
[31] B. Ramakrishna Rau,et al. Machine-Description Driven Compilers for EPIC and VLIW Processors , 1999, Des. Autom. Embed. Syst..
[32] Vicki H. Allan,et al. Software pipelining , 1995, CSUR.
[33] K.J. O'Connor,et al. Design issues for very-long-instruction-word VLSI video signal processors , 1996, VLSI Signal Processing, IX.
[34] B. Ramakrishna Rau,et al. PICO: Automatically Designing Custom Computers , 2002, Computer.
[35] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.
[36] P. Faraboschi,et al. Lx: a technology platform for customizable VLIW embedded processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[37] Vittorio Zaccaria,et al. A framework for Compiler Level statistical analysis over customized VLIW architecture , 2013, 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC).
[38] Andrew Wolfe,et al. Available parallelism in video applications , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[39] Cameron McNairy,et al. Itanium 2 Processor Microarchitecture , 2003, IEEE Micro.
[40] Thierry Lecroq,et al. The exact online string matching problem: A review of the most recent results , 2013, CSUR.
[41] Kemal Ebcioglu,et al. A study on the number of memory ports in multiple instruction issue machines , 1993, MICRO 1993.
[42] Fan Yang,et al. Flexible VLIW processor based on FPGA for efficient embedded real-time image processing , 2012, Journal of Real-Time Image Processing.
[43] Todd M. Austin,et al. Dynamic dependency analysis of ordinary programs , 1992, ISCA '92.
[44] Paolo Ienne,et al. Making wide-issue VLIW processors viable on FPGAs , 2012, TACO.
[45] Geoffrey Brown,et al. ρ-VEX: A reconfigurable and extensible softcore VLIW processor , 2008, 2008 International Conference on Field-Programmable Technology.
[46] Paolo Faraboschi,et al. Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools , 2004 .
[47] Jung Ho Ahn,et al. The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing , 2013, TACO.
[48] Soo-Mook Moon,et al. Generalized Multiway Branch Unit for VLIW Microprocessors , 1995, IEEE Trans. Parallel Distributed Syst..