Efficient design-space exploration of custom instruction-set extensions
暂无分享,去创建一个
[1] Scott A. Mahlke,et al. Cost sensitive modulo scheduling in a loop accelerator synthesis system , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[2] Wen-mei W. Hwu,et al. Modulo scheduling of loops in control-intensive non-numeric programs , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[3] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.
[4] Stamatis Vassiliadis,et al. Automatic selection of application-specific instruction-set extensions , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).
[5] João M. P. Cardoso. Dynamic loop pipelining in data-driven architectures , 2005, CF '05.
[6] G. McLachlan,et al. The EM algorithm and extensions , 1996 .
[7] Tulika Mitra,et al. Scalable custom instructions identification for instruction-set extensible processors , 2004, CASES '04.
[8] Alan Murray,et al. An End-to-End Design Flow for Automated Instruction Set Extension and Complex Instruction Selection Based on GCC , 2009 .
[9] Paolo Bonzini,et al. Code transformation strategies for extensible embedded processors , 2006, CASES '06.
[10] Majid Sarrafzadeh,et al. Area-efficient instruction set synthesis for reconfigurable system-on-chip designs , 2004, Proceedings. 41st Design Automation Conference, 2004..
[11] Scott Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.
[12] Tulika Mitra,et al. Characterizing embedded applications for instruction-set extensible processors , 2004, Proceedings. 41st Design Automation Conference, 2004..
[13] Scott A. Mahlke,et al. Streamroller:: automatic synthesis of prescribed throughput accelerator pipelines , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).
[14] Tao Li,et al. Fast enumeration of maximal valid subgraphs for custom-instruction identification , 2009, CASES '09.
[15] Scott Mahlke,et al. Processor acceleration through automated instruction set customization , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[16] Srivaths Ravi,et al. Synthesis of custom processors based on extensible platforms , 2002, ICCAD 2002.
[17] Scott A. Mahlke,et al. An architecture framework for transparent instruction set customization in embedded processors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[18] Koen Bertels,et al. Algorithms for the automatic extension of an instruction-set , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[19] Paolo Ienne,et al. Rethinking custom ISE identification: a new processor-agnostic method , 2007, CASES '07.
[20] John Wawrzynek,et al. Instruction-Level Parallelism for Reconfigurable Computing , 1998, FPL.
[21] Norman P. Jouppi,et al. Core architecture optimization for heterogeneous chip multiprocessors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[22] H. Corporaal,et al. Designing domain-specific processors , 2001, Ninth International Symposium on Hardware/Software Codesign. CODES 2001 (IEEE Cat. No.01TH8571).
[23] Paolo Ienne,et al. Exact and approximate algorithms for the extension of embedded processor instruction sets , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[24] Olivier Temam,et al. Reconciling specialization and flexibility through compound circuits , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[25] Scott A. Mahlke,et al. Bridging the computation gap between programmable processors and hardwired accelerators , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[26] Kiyoung Choi,et al. Loop pipelining in hardware-software partitioning , 1998, Proceedings of 1998 Asia and South Pacific Design Automation Conference.
[27] Paolo Bonzini,et al. A Retargetable Framework for Automated Discovery of Custom Instructions , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).
[28] Chong-Min Kyung,et al. Synthesis of application specific instructions for embedded DSP software , 1998, International Conference on Computer Aided Design.
[29] Scott A. Mahlke,et al. Exploring the design space of LUT-based transparent accelerators , 2005, CASES '05.
[30] Wen-mei W. Hwu,et al. Enhancing loop buffering of media and telecommunications applications using low-overhead predication , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[31] Tulika Mitra,et al. A Model for Hardware Realization of Kernel Loops , 2003, FPL.
[32] Cid C. de Souza,et al. Efficient datapath merging for partially reconfigurable architectures , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[33] Günhan Dündar,et al. An integer linear programming approach for identifying instruction-set extensions , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).
[34] Cesare Alippi,et al. A DAG-Based Design Approach for Reconfigurable VLIW Processors , 1999, DATE.
[35] Scott A. Mahlke,et al. Increasing hardware efficiency with multifunction loop accelerators , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).
[36] Scott A. Mahlke,et al. PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..
[37] F. F. Yao,et al. Approximation Algorithms for the Largest Common Subtree Problem. , 1995 .
[38] Giovanni De Micheli,et al. Synthesis and Optimization of Digital Circuits , 1994 .
[39] Horst Bunke,et al. Weighted minimum common supergraph for cluster representation , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).
[40] Barry M. Pangrle,et al. On the complexity of connectivity binding , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[41] Scott A. Mahlke,et al. Modulo scheduling for highly customized datapaths to increase hardware reusability , 2008, CGO '08.
[42] Paolo Ienne,et al. Exploiting pipelining to relax register-file port constraints of instruction-set extensions , 2005, CASES '05.
[43] Paolo Ienne,et al. A high-level synthesis flow for custom instruction set extensions for application-specific processors , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).
[44] Paolo Bonzini,et al. Heterogeneous coarse-grained processing elements: A template architecture for embedded processing acceleration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[45] Sri Parameswaran,et al. Novel architecture for loop acceleration: a case study , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).
[46] Majid Sarrafzadeh,et al. Instruction generation and regularity extraction for reconfigurable processors , 2002, CASES '02.
[47] Darin Petkov,et al. Automatic generation of application specific processors , 2003, CASES '03.
[48] Martin D. F. Wong,et al. Efficient ASIP design for configurable processors with fine-grained resource sharing , 2008, FPGA '08.
[49] Tao Li,et al. Efficient Heuristic Algorithm for Rapid Custom-Instruction Selection , 2009, 2009 Eighth IEEE/ACIS International Conference on Computer and Information Science.
[50] Wayne Luk,et al. Pipeline vectorization , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[51] Nikil D. Dutt,et al. Introduction of Architecturally Visible Storage in Instruction Set Extensions , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[52] Ramesh Karri,et al. ALPS: an algorithm for pipeline data path synthesis , 1991, MICRO 24.
[53] Paolo Bonzini,et al. Polynomial-time subgraph enumeration for automated instruction set extension , 2007 .
[54] Wu-chun Feng,et al. Making a Case for Efficient Supercomputing , 2003, ACM Queue.
[55] Prithviraj Banerjee,et al. Dynamic template generation for resource sharing in control and data flow graphs , 2006, 19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems Design (VLSID'06).
[56] Srivaths Ravi,et al. A Synthesis Methodology for Hybrid Custom Instruction and Coprocessor Generation for Extensible Processors , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[57] Majid Sarrafzadeh,et al. A unified theory of timing budget management , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..
[58] Gilles Brassard,et al. Fundamentals of Algorithmics , 1995 .
[59] Wayne Luk,et al. Fast custom instruction identification by convex subgraph enumeration , 2008, 2008 International Conference on Application-Specific Systems, Architectures and Processors.
[60] Ricardo E. Gonzalez,et al. Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.
[61] Nikil D. Dutt,et al. ISEGEN: an iterative improvement-based ISE generation technique for fast customization of processors , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[62] Wayne Luk,et al. Optimizing Instruction-set Extensible Processors under Data Bandwidth Constraints , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.
[63] Paolo Ienne,et al. Way Stealing: Cache-assisted automatic Instruction Set Extensions , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[64] P. Faraboschi,et al. Lx: a technology platform for customizable VLIW embedded processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[65] Scott A. Mahlke,et al. VEAL: Virtualized Execution Accelerator for Loops , 2008, 2008 International Symposium on Computer Architecture.
[66] Majid Sarrafzadeh,et al. Instruction generation for hybrid reconfigurable systems , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).
[67] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[68] Tatsuya Akutsu,et al. On the approximation of largest common subtrees and largest common point sets , 1994, Theor. Comput. Sci..
[69] Mike Schlansker,et al. Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.