Filter Caching for Free: The Untapped Potential of the Store-Buffer
暂无分享,去创建一个
[1] Alexander V. Veidenbaum,et al. Reducing data cache energy consumption via cached load/store queue , 2003, ISLPED '03.
[2] Francisco Tirado,et al. L1 Data Cache Power Reduction Using a Forwarding Predictor , 2010, PATMOS.
[3] Sarita V. Adve,et al. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[4] Kaushik Roy,et al. Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, MICRO.
[5] Erik Hagersten,et al. Cost-effective speculative scheduling in high performance processors , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[6] Todd M. Austin,et al. Cyclone: a broadcast-free dynamic instruction scheduler with selective replay , 2003, ISCA '03.
[7] R. E. Kessler,et al. Inexpensive implementations of set-associativity , 1989, ISCA '89.
[8] Kazuaki Murakami,et al. Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).
[9] Milo M. K. Martin,et al. NoSQ: Store-Load Communication without a Store Queue , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[10] Nikos Nikoleris,et al. Addressing Energy Challenges in Filter Caches , 2017, 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[11] Margaret Martonosi,et al. COATCheck: Verifying Memory Ordering at the Hardware-OS Interface , 2016, ASPLOS.
[12] Glenn Reinman,et al. Scaling the issue window with look-ahead latency prediction , 2004, ICS '04.
[13] Koen De Bosschere,et al. 2FAR: A 2bcgskew Predictor Fused by an Alloyed Redundant History Skewed Perceptron Branch Predictor , 2005, J. Instr. Level Parallelism.
[14] Stefanos Kaxiras,et al. Non-Speculative Store Coalescing in Total Store Order , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[15] Roberto Giorgi,et al. Reducing leakage in power-saving capable caches for embedded systems by using a filter cache , 2007, MEDEA '07.
[16] David Black-Schaffer,et al. Dynamically Disabling Way-prediction to Reduce Instruction Replay , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).
[17] T. N. Vijaykumar,et al. Reducing Design Complexity of the Load/Store Queue , 2003, MICRO.
[18] Sam Ainsworth,et al. Graph Prefetching Using Data Structure Knowledge , 2016, ICS.
[19] Kevin Skadron,et al. Design issues and tradeoffs for write buffers , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[20] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[21] Stefanos Kaxiras,et al. Applying Decay to Reduce Dynamic Power in Set-Associative Caches , 2007, HiPEAC.
[22] Dirk Grunwald,et al. Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[23] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .
[24] Stefanos Kaxiras,et al. The Superfluous Load Queue , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[25] Stefanos Kaxiras,et al. Complexity-effective multicore coherence , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[26] Stefanos Kaxiras,et al. Racer: TSO consistency via race detection , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] Andreas Moshovos,et al. Dynamic Speculation and Synchronization of Data Dependences , 1997, ISCA.
[28] Pierre Michaud,et al. Data-flow prescheduling for large instruction windows in out-of-order processors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[29] William H. Mangione-Smith,et al. The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[30] Andreas Moshovos,et al. Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[31] Ibrahim N. Hajj,et al. Using dynamic cache management techniques to reduce energy in a high-performance processor , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).
[32] Glenn Reinman,et al. Precise Instruction Scheduling , 2005, J. Instr. Level Parallelism.
[33] Frank Vahid,et al. A Way-Halting Cache for Low-Energy High-Performance Systems , 2005, IEEE Computer Architecture Letters.
[34] Hsien-Hsin S. Lee,et al. Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches , 2009, ISLPED.
[35] Stéphan Jourdan,et al. Speculation techniques for improving load related instruction scheduling , 1999, ISCA.
[36] Joel S. Emer,et al. Memory dependence prediction using store sets , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[37] Gabriel H. Loh,et al. Fire-and-Forget: Load/Store Scheduling with No Store Queue at All , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[38] Kimming So,et al. Cache Operations by MRU Change , 1988, IEEE Trans. Computers.
[39] Milo M. K. Martin,et al. Scalable store-load forwarding via store queue index prediction , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).