Parallelizing Compiler for Single and Multicore Computing
暂无分享,去创建一个
[1] Tsutomu Yoshinaga,et al. The QC-2 parallel Queue processor architecture , 2008, J. Parallel Distributed Comput..
[2] Tsutomu Yoshinaga,et al. High-Level Modeling and FPGA Prototyping of Produced Order Parallel Queue Processor Core , 2006, The Journal of Supercomputing.
[3] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.
[4] Josep Llosa,et al. Quantitative Evaluation of Register Pressure on Software Pipelined Loops , 1998, International Journal of Parallel Programming.
[5] Liam Goudge,et al. Thumb: reducing the cost of 32-bit RISC performance in portable and consumer applications , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.
[6] Tsutomu Yoshinaga,et al. Parallel Queue Processor Architecture Based on Produced Order Computation Model , 2005, The Journal of Supercomputing.
[7] Arquimedes Canedo,et al. Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture , 2011, The Journal of Supercomputing.
[8] Masahiro Sowa,et al. Design of a superscalar processor based on queue machine computation model , 1999, 1999 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM 1999). Conference Proceedings (Cat. No.99CH36368).
[9] Javier Zalamea,et al. Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures , 2004, International Journal of Parallel Programming.
[10] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.
[11] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[12] Scott A. Mahlke,et al. Partitioning variables across register windows to reduce spill code in a low-power processor , 2005, IEEE Transactions on Computers.
[13] Herman Schmit,et al. Queue machines: hardware compilation in hardware , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[14] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[15] Philip H. Sweany,et al. A Code Generation Framework for VLIW Architectures with Partitioned Register Banks , 2007 .
[16] Arquimedes Canedo,et al. A new code generation algorithm for 2-offset producer order queue computation model , 2008, Comput. Lang. Syst. Struct..
[17] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[18] Lenwood S. Heath,et al. Stack and Queue Layouts of Directed Acyclic Graphs: Part I , 1999, SIAM J. Comput..
[19] Xue Ming Henry Huang. High-level loop transformations for architectures with partitioned register banks , 2000 .
[20] Monica S. Lam,et al. RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .
[21] Henk Corporaal,et al. Partitioned register file for TTAs , 1995, MICRO 1995.
[22] Masahiro Sowa,et al. Design and architecture for an embedded 32-bit QueueCore , 2006, J. Embed. Comput..
[23] Edward S. Davidson,et al. Evaluating the Use of Register Queues in Software Pipelined Loops , 2001, IEEE Trans. Computers.
[24] Arquimedes Canedo,et al. Efficient compilation for queue size constrained queue processors , 2009, Parallel Comput..
[25] Gürhan Küçük,et al. Energy Efficient Register Renaming , 2003, PATMOS.
[26] Corporate. SPARC architecture manual - version 8 , 1992 .
[27] Shlomit S. Pinter,et al. Register allocation with instruction scheduling , 1993, PLDI '93.
[28] Arquimedes Canedo,et al. Compiling for Reduced Bit-Width Queue Processors , 2010, J. Signal Process. Syst..
[29] Bruno R. Preiss,et al. Data flow on a queue machine , 1985, ISCA 1985.