Priority-based squash reducing methods in thread level speculation

Thread level speculation (TLS) aggressively transforms long serial programme into multiple short parallel threads to significantly boost the performance of sequential programmes. But frequent squashing, which is caused by violation between multiple parallel threads, will greatly offset the benefits from parallelisation. Most existing works only focus on improving TLS scheme itself directly, such as preventing false-sharing, pre-computing or predicting values and so on. On the contrast, we realise that squashes are caused by messages whose arriving orders are violated. Thus, these squashes can be reduced by rearranging TLS messages. For reducing TLS squashes, in this paper, we first propose a priority-aware network-on-chip (NoC), which uses a priority-based packet arbitration policy to reorder messages at router. Further, we extend this priority scheme by employing prioritising policy into the directory for TLS system that uses directory-based cache coherence protocol. The extension results in a cost-less version. Experimental evaluation for five typical application kernels of SPEC2000 shows that our NoC approach reduces squashing rate by 22% in best case and 15% on average.

[1]  Antonia Zhai,et al.  Improving value communication for thread-level speculation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[2]  Chita R. Das,et al.  Aérgia: exploiting packet latency slack in on-chip networks , 2010, ISCA.

[3]  Krste Asanovic,et al.  Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[4]  Ming Cong,et al.  LogSPoTM: a scalable thread level speculation model based on transactional memory , 2008, 2008 13th Asia-Pacific Computer Systems Architecture Conference.

[5]  Josep Torrellas,et al.  Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[6]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[7]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[8]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[9]  William J. Dally,et al.  Research Challenges for On-Chip Interconnection Networks , 2007, IEEE Micro.

[10]  Josep Torrellas,et al.  Architectural support for scalable speculative parallelization in shared-memory multiprocessors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Natalie D. Enright Jerger SigNet: Network-on-chip filtering for coarse vector directories , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[12]  Susan J. Eggers,et al.  Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.

[13]  Chita R. Das,et al.  Application-aware prioritization mechanisms for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  John Paul Shen,et al.  Mitosis: A Speculative Multithreaded Processor Based on Precomputation Slices , 2008, IEEE Transactions on Parallel and Distributed Systems.

[15]  Gurindar S. Sohi,et al.  Master/Slave Speculative Parallelization , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[16]  Wei Liu,et al.  CAP: Criticality analysis for power-efficient speculative multithreading , 2007, 2007 25th International Conference on Computer Design.

[17]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[18]  Xiaomei Li,et al.  A Priority-Aware NoC to Reduce Squashes in Thread Level Speculation for Chip Multiprocessors , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.

[19]  Ran Ginosar,et al.  The Power of Priority: NoC Based Distributed Cache Coherency , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[20]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[21]  Alan L. Cox,et al.  Tradeoffs between false sharing and aggregation in software distributed shared memory , 1997, PPOPP '97.

[22]  Chita R. Das,et al.  Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects , 2007 .

[23]  Kunle Olukotun,et al.  Using thread-level speculation to simplify manual parallelization , 2003, PPoPP '03.

[24]  Josep Torrellas,et al.  The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors , 2004, International Journal of Parallel Programming.