Patty: a pattern-based parallelization tool for the multicore age

The free lunch of ever increasing clock frequencies is over. Performance-critical sequential software must be parallelized, and this is tedious, hard, buggy, knowledge-intensive, and time-consuming. In order to assist software engineers appropriately, parallelization tools need to consider detection, transformation, correctness, and performance all together. This paper introduces a pattern-based process model that assists in all four parallelization tasks and hence facilitates transforming legacy software that had not been developed with multicore in mind. Our approach uses optimistic parallelization and generates a semantic model with static and dynamic information. With this information we detect parallelizable regions and runtime-relevant tuning parameters. The regions are then transformed to tunable parallel patterns. The process model covers the detection of parallelizable regions, the identification of appropriate parallelization strategies, and enhances traditional parallelization processes with correctness and performance validations. We implemented the pattern-based process model in Patty, a tool that actively assists engineers in the tedious and error-prone software parallelization tasks. This paper also contains a user study that compares the effectivity of optimistic pattern-based parallelization as implemented in Patty to 1) a popular commercial parallelization tool and 2) pure manual parallelization. We demonstrate that our approach receives the best average scores from its users while delivering the best results within the least amount of time. In our user study Patty outperforms both control groups in subjective and objective measurements. Patty achieves parallel performance comparable to a skilled parallel software engineer within minutes rather than days of work. This makes our approach attractive for experts and inexperienced software engineers alike.

[1]  Herb Sutter,et al.  A Fundamental Turn Toward Concurrency in Software , 2008 .

[2]  Albert Cohen,et al.  Putting Polyhedral Loop Transformations to Work , 2003, LCPC.

[3]  Paul Petersen Intel® Parallel Studio , 2011, Encyclopedia of Parallel Computing.

[4]  Victor Pankratius,et al.  Run-Time Automatic Performance Tuning for Multicore Applications , 2011, Euro-Par.

[5]  Korbinian Molitorisz Pattern-Based Refactoring Process of Sequential Source Code , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[6]  Jeffrey C. Carver,et al.  Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[7]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[8]  Alexandra Fedorova,et al.  Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.

[9]  Martin Schrepp,et al.  Konstruktion eines Fragebogens zur Messung der User Experience von Softwareprodukten , 2006, MuC.

[10]  Albert Cohen Automatic Parallelization in GCC: for Research and for Real (Keynote Talk) , 2010 .

[11]  Sebastian Hack,et al.  Sambamba: runtime adaptive parallel execution , 2013, ADAPT '13.

[12]  Ilona Bluemke,et al.  C code parallelization with paragraph , 2010, 2010 2nd International Conference on Information Technology, (2010 ICIT).

[13]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[14]  Youguo Pi,et al.  Theory of Cognitive Pattern Recognition , 2008 .

[15]  Rudolf Eigenmann,et al.  Cetus: A Source-to-Source Compiler Infrastructure for Multicores , 2009, Computer.

[16]  Walter F. Tichy,et al.  Engineering parallel applications with tunable architectures , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[17]  Walter F. Tichy,et al.  Fundamentals of Multicore Software Development , 2011 .

[18]  Walter F. Tichy,et al.  High-level multicore programming with XJava , 2009, 2009 31st International Conference on Software Engineering - Companion Volume.

[19]  Walter F. Tichy,et al.  Software engineering for multicore systems: an experience report , 2008, IWMSE '08.

[20]  Michael Hind,et al.  Pointer analysis: haven't we solved this problem yet? , 2001, PASTE '01.

[21]  Flemming Nielson,et al.  Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[22]  Tom Mens,et al.  Averting the Next Software Crisis , 2011, Computer.

[23]  Korbinian Molitorisz,et al.  Automatic Parallelization Using AutoFutures , 2012, MSEPT.

[24]  Ilona Bluemke,et al.  A Tool Supporting C code Parallelization , 2009, SCSS.

[25]  Manuel Prieto,et al.  Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.

[26]  Christoph A. Schaefer Automatische Performanzoptimierung paralleler Architekturen , 2010 .

[27]  James R. Larus,et al.  Software and the Concurrency Revolution , 2005, ACM Queue.

[28]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[29]  Andreas Zeller,et al.  Profiling Java programs for parallelism , 2009, 2009 ICSE Workshop on Multicore Software Engineering.

[30]  Yu Liu,et al.  Towards Systematic Parallel Programming over MapReduce , 2011, Euro-Par.

[31]  Walter F. Tichy,et al.  A Language-Based Tuning Mechanism for Task and Pipeline Parallelism , 2010, Euro-Par.

[32]  Koen De Bosschere,et al.  A profile-based tool for finding pipeline parallelism in sequential programs , 2010, Parallel Comput..

[33]  Walter F. Tichy,et al.  Parallelizing an index generator for desktop search , 2010, ISCA'10.

[34]  Walter F. Tichy,et al.  Automatic generation of parallel unit tests , 2013, 2013 8th International Workshop on Automation of Software Test (AST).

[35]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[36]  Hyesoon Kim,et al.  SD3: A Scalable Approach to Dynamic Data-Dependence Profiling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[37]  Timothy G. Mattson,et al.  Parallel programming: Can we PLEASE get it right this time? , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[38]  Alan Mycroft,et al.  Estimating and Exploiting Potential Parallelism by Source-Level Dependence Profiling , 2010, Euro-Par.

[39]  Thomas Ball,et al.  Finding and Reproducing Heisenbugs in Concurrent Programs , 2008, OSDI.

[40]  Monica S. Lam,et al.  Interprocedural parallelization analysis in SUIF , 2005, TOPL.

[41]  Dennis M. Sullivan Towards a Holistic Approach to Design , 2003 .

[42]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[43]  Fred W. Glover,et al.  Future paths for integer programming and links to artificial intelligence , 1986, Comput. Oper. Res..

[44]  Björn Franke,et al.  Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[45]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .