Enabling Branch-Mispredict Level Parallelism by Selectively Flushing Instructions
暂无分享,去创建一个
Stijn Eyerman | Wim Heirman | Ibrahim Hur | Sam Van den Steen | Stijn Eyerman | I. Hur | W. Heirman | S. V. D. Steen
[1] Lieven Eeckhout,et al. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[2] Mateo Valero,et al. Control-flow independence reuse via dynamic vectorization , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[3] Srinivas Devadas,et al. IMP: Indirect memory prefetcher , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Haitham Akkary,et al. Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors , 2003, MICRO.
[5] Mayank Agarwal,et al. Exploiting Postdominance for Speculative Parallelization , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[6] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.
[7] Harold W. Cain,et al. SPF: Selective Pipeline Flush , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).
[8] Haitham Akkary,et al. Reducing branch misprediction penalty via selective branch recovery , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[9] Onur Mutlu,et al. Wish branches: combining conditional branching and predication for adaptive predicated execution , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[10] R.H. Dennard,et al. Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.
[11] Efraim Rotem,et al. Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake , 2017, IEEE Micro.
[12] Margaret Martonosi,et al. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[13] Farzad Samie,et al. Power and frequency analysis for data and control independence in embedded processors , 2011, 2011 International Green Computing Conference and Workshops.
[14] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[15] Balaram Sinharoy,et al. IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.
[16] John A. Miller,et al. Techniques for Graph Analytics on Big Data , 2013, 2013 IEEE International Congress on Big Data.
[17] James E. Smith,et al. Advanced Micro Devices , 2005 .
[18] Eric Rotenberg,et al. Control independence in trace processors , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[19] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[20] M. V. Wilkes,et al. The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .
[21] Richard A. Lethin,et al. Highly Scalable Near Memory Processing with Migrating Threads on the Emu System Architecture , 2016, 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3).
[22] Sreenivas Subramoney,et al. Auto-Predication of Critical Branches* , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[23] Quan M. Nguyen,et al. Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[24] Trevor E. Carlson,et al. NOREBA: a compiler-informed non-speculative out-of-order commit processor , 2021, ASPLOS.
[25] Hang Liu,et al. SIMD-X: Programming and Processing of Graph Algorithms on GPUs , 2018, USENIX Annual Technical Conference.
[26] Philip S. Yu,et al. A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[27] David Thomas,et al. The Art in Computer Programming , 2001 .
[28] Stijn Eyerman,et al. Many-Core Graph Workload Analysis , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[29] Eric Rotenberg,et al. Transparent control independence (TCI) , 2007, ISCA '07.
[30] Chen-Yong Cher,et al. Skipper: a microarchitecture for exploiting control-flow independence , 2001, MICRO.
[31] Onur Mutlu,et al. Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[32] Mayank Agarwal,et al. Fetch-Criticality Reduction through Control Independence , 2008, 2008 International Symposium on Computer Architecture.
[33] Michael Gschwind,et al. IBM POWER8 processor core microarchitecture , 2015, IBM J. Res. Dev..
[34] A. Kopser,et al. Overview of the Next Generation Cray XMT , 2011 .
[35] Qi Li,et al. Distributed Control Independence for Composable Multi-processors , 2012, 2012 IEEE/ACIS 11th International Conference on Computer and Information Science.
[36] André Seznec,et al. A new case for the TAGE branch predictor , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[37] Heiner Litz,et al. Classifying Memory Access Patterns for Prefetching , 2020, ASPLOS.
[38] Jeremy Kepner,et al. Novel graph processor architecture, prototype system, and results , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[39] Mayank Agarwal,et al. Branch-mispredict level parallelism (BLP) for control independence , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[40] Dean M. Tullsen,et al. Control Flow Optimization Via Dynamic Reconvergence Prediction , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[41] David A. Patterson,et al. The GAP Benchmark Suite , 2015, ArXiv.
[42] Tianshi Chen,et al. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[43] Gurindar S. Sohi,et al. Register integration: a simple and efficient implementation of squash reuse , 2000, MICRO 33.
[44] Scott B. Baden,et al. Redefining the Role of the CPU in the Era of CPU-GPU Integration , 2012, IEEE Micro.