Store Vulnerability Window (SVW): A Filter and Potential Replacement for Load Re-Execution

Load scheduling and execution are performance critical aspects of dynamically-scheduled proce Several techniques employ speculation on loads with respect to older stores to improve some aspect processing. Speculative scheduling and speculative indexed store-load forwarding are two examples Speculative actions require verification. One simple mechanism that can verify any load specula in-order re-execution prior to commit. The drawback of load re-execution is data cache bandwidth con tion. If a given technique requires a sufficient fraction of the loads to re-execute, the resulting contentio severely compromise the intended benefit. Store Vulnerability Window (SVW) is an address-based filtering mechanism that significantly red the number of loads that must re-execute to verify a given speculative technique. The high-level idea is load need not re-execute if the address it reads has not been written to in a long time. SVW realizes th using a store sequence numbering scheme and an adaptation of Bloom filtering. An SVW implemen with a 1KB filter can reduce re-executions by a factor of 200 and virtually eliminate the overhead of re cution based verification. The same SVW implementation can be used as a complete replacement for cution with only 3% overhead.

[1]  Rajiv Gupta,et al.  Dynamic memory disambiguation in the presence of out-of-order store issuing , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  Albert Meixner,et al.  Dynamic Verification of Memory Consistency in Cache-Coherent Multithreaded Computer Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[3]  Craig B. Zilles,et al.  Decomposing the load-store queue by function for power reduction and scalability , 2006, IBM J. Res. Dev..

[4]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[5]  Anoop Gupta,et al.  Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.

[6]  Andreas Moshovos,et al.  Read-after-read memory dependence prediction , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[7]  Mikko H. Lipasti,et al.  On the value locality of store instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[8]  Víctor Viñals,et al.  Store buffer design in first-level multibanked data caches , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[9]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[10]  David R. Kaeli,et al.  Levo - A Scalable Processor With High IPC , 2003, J. Instr. Level Parallelism.

[11]  Milo M. K. Martin,et al.  Scalable store-load forwarding via store queue index prediction , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[12]  Scott A. Mahlke,et al.  Dynamic memory disambiguation using the memory conflict buffer , 1994, ASPLOS VI.

[13]  Amir Roth,et al.  RENO: a rename-based instruction optimizer , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[14]  Andreas Moshovos,et al.  Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[15]  Sam S. Stone,et al.  Address-indexed memory disambiguation and store-to-load forwarding , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[16]  Joel S. Emer,et al.  Memory dependence prediction using store sets , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[17]  Stéphan Jourdan,et al.  Speculation techniques for improving load related instruction scheduling , 1999, ISCA.

[18]  Joel S. Emer,et al.  Loose loops sink chips , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[19]  Amir Roth,et al.  Store vulnerability window (SVW): re-execution filtering for enhanced load optimization , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[20]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[21]  Milo M. K. Martin,et al.  Token Coherence: decoupling performance and correctness , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[22]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[23]  Michael C. Huang,et al.  Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[24]  Mikko H. Lipasti,et al.  Memory Ordering: A Value-Based Approach , 2004, ISCA 2004.

[25]  Amir Roth,et al.  Three extensions to register integration , 2002, MICRO 35.