Reducing fetch architecture complexity using procedure inlining
暂无分享,去创建一个
[1] Norman P. Jouppi,et al. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays , 2002, ISCA.
[2] A. Mendelson,et al. Improving Trace Cache Effectiveness with Branch Promotion and Trace Packing , 1998, ISCA 1998.
[3] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[4] A. J. KleinOsowski,et al. MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.
[5] Steve Johnson,et al. Compiling C for vectorization, parallelization, and inline expansion , 1988, PLDI '88.
[6] Sanjay J. Patel,et al. Increasing the size of atomic instruction blocks using control flow assertions , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[7] Daniel A. Jiménez,et al. The impact of delay on the design of branch predictors , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[8] David R. Kaeli,et al. Using cache line coloring to perform aggressive procedure inlining , 2000, CARN.
[9] Yale N. Patt,et al. Alternative fetch and issue policies for the trace cache fetch mechanism , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[10] Wen-mei W. Hwu,et al. Achieving High Instruction Cache Performance With An Optimizing Compiler , 1989, The 16th Annual International Symposium on Computer Architecture.
[11] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[12] Mateo Valero,et al. Fetching instruction streams , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[13] Andrew Ayers,et al. Aggressive inlining , 1997, PLDI '97.
[14] Koen De Bosschere,et al. alto: a link-time optimizer for the Compaq Alpha , 2001, Softw. Pract. Exp..
[15] M. Valero,et al. Latency tolerant branch predictors , 2003, Innovative Architecture for Future Generation High-Performance Processors and Systems, 2003.
[16] Glenn Reinman,et al. A scalable front-end architecture for fast instruction delivery , 1999, ISCA.
[17] Norman P. Jouppi,et al. Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .