论文信息 - Store Vulnerability Window (SVW): A Filter and Potential Replacement for Load Re-Execution

Store Vulnerability Window (SVW): A Filter and Potential Replacement for Load Re-Execution

Load scheduling and execution are performance critical aspects of dynamically-scheduled proce Several techniques employ speculation on loads with respect to older stores to improve some aspect processing. Speculative scheduling and speculative indexed store-load forwarding are two examples Speculative actions require verification. One simple mechanism that can verify any load specula in-order re-execution prior to commit. The drawback of load re-execution is data cache bandwidth con tion. If a given technique requires a sufficient fraction of the loads to re-execute, the resulting contentio severely compromise the intended benefit. Store Vulnerability Window (SVW) is an address-based filtering mechanism that significantly red the number of loads that must re-execute to verify a given speculative technique. The high-level idea is load need not re-execute if the address it reads has not been written to in a long time. SVW realizes th using a store sequence numbering scheme and an adaptation of Bloom filtering. An SVW implemen with a 1KB filter can reduce re-executions by a factor of 200 and virtually eliminate the overhead of re cution based verification. The same SVW implementation can be used as a complete replacement for cution with only 3% overhead.

Amir Roth

[1] Rajiv Gupta,et al. Dynamic memory disambiguation in the presence of out-of-order store issuing , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[2] Albert Meixner,et al. Dynamic Verification of Memory Consistency in Cache-Coherent Multithreaded Computer Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[3] Craig B. Zilles,et al. Decomposing the load-store queue by function for power reduction and scalability , 2006, IBM J. Res. Dev..

[4] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.

[5] Anoop Gupta,et al. Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.

[6] Andreas Moshovos,et al. Read-after-read memory dependence prediction , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[7] Mikko H. Lipasti,et al. On the value locality of store instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[8] Víctor Viñals,et al. Store buffer design in first-level multibanked data caches , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[9] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[10] David R. Kaeli,et al. Levo - A Scalable Processor With High IPC , 2003, J. Instr. Level Parallelism.

[11] Milo M. K. Martin,et al. Scalable store-load forwarding via store queue index prediction , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[12] Scott A. Mahlke,et al. Dynamic memory disambiguation using the memory conflict buffer , 1994, ASPLOS VI.

[13] Amir Roth,et al. RENO: a rename-based instruction optimizer , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[14] Andreas Moshovos,et al. Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[15] Sam S. Stone,et al. Address-indexed memory disambiguation and store-to-load forwarding , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[16] Joel S. Emer,et al. Memory dependence prediction using store sets , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[17] Stéphan Jourdan,et al. Speculation techniques for improving load related instruction scheduling , 1999, ISCA.

[18] Joel S. Emer,et al. Loose loops sink chips , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[19] Amir Roth,et al. Store vulnerability window (SVW): re-execution filtering for enhanced load optimization , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[20] Mikko H. Lipasti,et al. Value locality and load value prediction , 1996, ASPLOS VII.

[21] Milo M. K. Martin,et al. Token Coherence: decoupling performance and correctness , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[22] Todd M. Austin,et al. DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[23] Michael C. Huang,et al. Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[24] Mikko H. Lipasti,et al. Memory Ordering: A Value-Based Approach , 2004, ISCA 2004.

[25] Amir Roth,et al. Three extensions to register integration , 2002, MICRO 35.