Parallel schedule synthesis for attribute grammars

We examine how to synthesize a parallel schedule of structured traversals over trees. In our system, programs are declaratively specified as attribute grammars. Our synthesizer automatically, correctly, and quickly schedules the attribute grammar as a composition of parallel tree traversals. Our downstream compiler optimizes for GPUs and multicore CPUs. We provide support for designing efficient schedules. First, we introduce a declarative language of schedules where programmers may constrain any part of the schedule and the synthesizer will complete and autotune the rest. Furthermore, the synthesizer answers debugging queries about how schedules may be completed. We evaluate our approach with two case studies. First, we created the first parallel schedule for a large fragment of CSS and report a 3X multicore speedup. Second, we created an interactive GPU-accelerated animation of over 100,000 nodes.

[1]  Sebastian Burckhardt,et al.  Two for the price of one: a model for parallel and incremental computation , 2011, OOPSLA '11.

[2]  Richard M. Karp,et al.  The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.

[3]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[4]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[5]  Vikram S. Adve,et al.  Automatic pool allocation: improving performance by controlling data structure layout in the heap , 2005, PLDI '05.

[6]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[7]  James R. Low Automatic data structure selection: an example and overview , 1978, CACM.

[8]  Heather Brown Parallel Processing and Document Layout , 1988, Electron. Publ..

[9]  Uwe Kastens,et al.  Ordered attributed grammars , 1980, Acta Informatica.

[10]  Ian Jacobs,et al.  Cascading Style Sheets, level 2 CSS2 Specification , 2008 .

[11]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[12]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[13]  Chi-Bang Kuan,et al.  Automated Empirical Optimization , 2011, Encyclopedia of Parallel Computing.

[14]  Frédéric Vivien,et al.  Revisiting the decomposition of Karp, Miller and Winograd , 1995, Proceedings The International Conference on Application Specific Array Processors.

[15]  Alain Colmerauer,et al.  An introduction to Prolog III , 1989, CACM.

[16]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Alexander Aiken,et al.  Data representation synthesis , 2011, PLDI '11.

[18]  João Saraiva,et al.  Generating Spreadsheet-Like Tools from Strong Attribute Grammars , 2003, GPCE.

[19]  Leo A. Meyerovich,et al.  Fast and parallel webpage layout , 2010, WWW '10.

[20]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[21]  Martin Jourdan,et al.  A Survey of Parallel Attribute Evaluation Methods , 1991, Attribute Grammars, Applications and Systems.

[22]  Keshav Pingali,et al.  Elixir: a system for synthesizing concurrent graph programs , 2012, OOPSLA '12.

[23]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[24]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[25]  Jack Dongarra,et al.  Special Issue on Program Generation, Optimization, and Platform Adaptation , 2005, Proc. IEEE.

[26]  Donald E. Knuth,et al.  Semantics of context-free languages , 1968, Mathematical systems theory.

[27]  Samuel T. King,et al.  A case for parallelizing web pages , 2012, HotPar'12.