Collection-focused Parallelism

Constructing parallel software is, in essence, the process of associating 'work' with computational units. The definition of work is dependent upon the model of parallelism used, and our choice of model can have profound effects on both programmer productivity and run-time efficiency. Given that the movement of data is responsible for the majority of parallelism overhead, and accessing data is responsible for the majority of parallelism errors, data items should be the basis for describing parallel work. As data items rarely exist in isolation and are instead parts of larger collections, we argue that subsets of collections should be the basic unit of parallelism. This requires a semantically rich method of referring to these sub-collections. Sub-collections are not guaranteed to be disjoint, and so an efficient run-time mechanism is required to maintain correctness. With a focus on complex systems, we present some of the challenges inherent in this approach and describe how we are extending Synchronization via Scheduling (SvS) and other techniques to overcome these difficulties. We discuss our experiences incorporating these techniques into a modern video game engine used in an in-development title.

[1]  Jens Palsberg,et al.  Concurrent Collections , 2010, Sci. Program..

[2]  Andrea Tagliasacchi,et al.  Searching for Concurrent Design Patterns in Video Games , 2009, Euro-Par.

[3]  Monica S. Lam,et al.  Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[4]  Guilherme Ottoni,et al.  Communication optimizations for global multi-threaded instruction scheduling , 2008, ASPLOS.

[5]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[6]  Michael McCool,et al.  Structured parallel programming with deterministic patterns , 2010 .

[7]  Roberto Ierusalimschy,et al.  Lua—An Extensible Extension Language , 1996 .

[8]  Toshitsugu Yuba,et al.  Dataflow Computing Models, Languages, and Machines for Intelligence Computations , 1988, IEEE Trans. Software Eng..

[9]  Haibo Chen,et al.  Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[11]  Roberto Ierusalimschy,et al.  Lua—An Extensible Extension Language , 1996, Softw. Pract. Exp..

[12]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[13]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[14]  Monica S. Lam,et al.  The design, implementation, and evaluation of Jade , 1998, TOPL.

[15]  Andrew Brownsword,et al.  Synchronization via scheduling: techniques for efficiently managing shared state , 2011, PLDI '11.

[16]  Jignesh M. Patel,et al.  High-Performance Concurrency Control Mechanisms for Main-Memory Databases , 2011, Proc. VLDB Endow..

[17]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[18]  Andrew Brownsword,et al.  Schedule Data, Not Code , 2011 .

[19]  Pierre Alliez,et al.  Polygon Mesh Processing , 2010 .

[20]  Vivek Sarkar,et al.  Integrating task parallelism with actors , 2012, OOPSLA '12.

[21]  Jason Gregory,et al.  Game Engine Architecture , 2009 .