POSTER: An Architecture and Programming Model for Accelerating Parallel Commutative Computations via Privatization

Synchronization and data movement are the key impediments to an efficient parallel execution. To ensure that data shared by multiple threads remain consistent, the programmer must use synchronization (e.g., mutex locks) to serialize threads' accesses to data. This limits parallelism because it forces threads to sequentially access shared resources. Additionally, systems use cache coherence to ensure that processors always operate on the most up-to-date version of a value even in the presence of private caches. Coherence protocol implementations cause processors to serialize their accesses to shared data, further limiting parallelism and performance.

[1]  Sebastian Burckhardt,et al.  Two for the price of one: a model for parallel and incremental computation , 2011, OOPSLA '11.

[2]  Zhiyuan Li,et al.  General data structure expansion for multi-threading , 2013, PLDI.

[3]  Milo M. K. Martin,et al.  RETCON: transactional repair without replay , 2010, ISCA '10.

[4]  Daniel Sánchez,et al.  Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Martin C. Rinard,et al.  Eliminating synchronization bottlenecks using adaptive replication , 2003, TOPL.

[6]  David A. Padua,et al.  Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.