Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams

Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite and infinite data structures. However, using this API efficiently involves subtle considerations like determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. In this paper, we present an automated refactoring approach that assists developers in writing efficient stream code in a semantics-preserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel and unorder or de-parallelize already parallel streams. The approach was implemented as a plug-in to the Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 11 Java projects consisting of ?642K lines of code. We found that 57 of 157 candidate streams (36.31%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential.

[1]  Hideya Iwasaki,et al.  Automatic parallelization via matrix multiplication , 2011, PLDI '11.

[2]  Atanas Rountev,et al.  Automated Refactoring of Legacy Java Software to Enumerated Types , 2007, 2007 IEEE International Conference on Software Maintenance.

[3]  Michael Wolfe,et al.  Parallelizing compilers , 1996, CSUR.

[4]  Benno Stein,et al.  Safe Stream-Based Programming with Refinement Types , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Eric Bodden Efficient hybrid typestate analysis by determining continuation-equivalent states , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[6]  Vivek Sarkar,et al.  Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection , 2015, PPPJ.

[7]  Aruna Raja,et al.  Domain Specific Languages , 2010 .

[8]  Robert Heumüller,et al.  Programmers do not favor lambda expressions for concurrent object-oriented code , 2018, Empirical Software Engineering.

[9]  Qin Li,et al.  Formalizing MapReduce with CSP , 2010, 2010 17th IEEE International Conference and Workshops on Engineering of Computer Based Systems.

[10]  Raffi Khatchadourian,et al.  Going big: a large-scale study on what big data developers ask , 2019, ESEC/SIGSOFT FSE.

[11]  Ondrej Lhoták,et al.  Automatic parallelization for graphics processing units , 2009, PPPJ '09.

[12]  Shubham Sangle,et al.  On the use of lambda expressions in 760 open source Python projects , 2019, ESEC/SIGSOFT FSE.

[13]  Mehdi Bagherzadeh,et al.  [Engineering Paper] A Tool for Optimizing Java 8 Stream Software via Automated Refactoring , 2018, 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[14]  Williams Ludwell Harrison,et al.  The interprocedural analysis and automatic parallelization of Scheme programs , 1990, LISP Symb. Comput..

[15]  Hridesh Rajan,et al.  On ordering problems in message passing software , 2016, MODULARITY.

[16]  Raffi Khatchadourian,et al.  Safe automated refactoring for intelligent parallelization of Java 8 streams , 2020, Sci. Comput. Program..

[17]  Martin C. Rinard,et al.  Automatic parallelization of divide and conquer algorithms , 1999, PPoPP '99.

[18]  Frank Tip,et al.  Refactoring using type constraints , 2011, TOPL.

[19]  Eran Yahav,et al.  Effective typestate verification in the presence of aliasing , 2006, TSEM.

[20]  Yannis Smaragdakis,et al.  Streams a la carte: Extensible Pipelines with Object Algebras , 2015, ECOOP.

[21]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[22]  Koen De Bosschere,et al.  The paralax infrastructure: Automatic parallelization with a helping hand , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23]  Edna Dias Canedo,et al.  Does the Introduction of Lambda Expressions Improve the Comprehension of Java Programs? , 2019, SBES.

[24]  Other Contributors Are Indicated Where They Contribute The Eclipse Foundation , 2017 .

[25]  Francesca Arcelli Fontana,et al.  A Duplicated Code Refactoring Advisor , 2015, XP.

[26]  Richard Warburton,et al.  Java 8 Lambdas: Pragmatic Functional Programming , 2014 .

[27]  Wenguang Chen,et al.  Nondeterminism in MapReduce considered harmful? an empirical study on non-commutative aggregators in MapReduce programs , 2014, ICSE Companion.

[28]  Luca Padovani Deadlock-Free Typestate-Oriented Programming , 2018, Art Sci. Eng. Program..

[29]  Davood Mazinanian,et al.  Clone Refactoring with Lambda Expressions , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[30]  Yoonsik Cheon,et al.  Writing JML Specifications Using Java 8 Streams , 2016 .

[31]  Mehdi Bagherzadeh,et al.  What do concurrency developers ask about?: a large-scale study using stack overflow , 2018, ESEM.

[32]  Christopher Ré,et al.  Automatic Optimization for MapReduce Programs , 2011, Proc. VLDB Endow..

[33]  Robert E. Strom,et al.  Typestate: A programming language concept for enhancing software reliability , 1986, IEEE Transactions on Software Engineering.

[34]  Adam Kiezun,et al.  Integrating Refactoring Support into a Java Development Tool , 2001, OOPSLA 2001.

[35]  J. Ramanujam,et al.  Automatic parallelization of a class of irregular loops for distributed memory systems , 2014, TOPC.

[36]  Tarek S. Abdelrahman,et al.  Run-Time Support for the Automatic Parallelization of Java Programs , 2004, The Journal of Supercomputing.

[37]  Mehdi Bagherzadeh,et al.  Poster: Towards Safe Refactoring for Intelligent Parallelization of Java 8 Streams , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[38]  Hidehiko Masuhara,et al.  Automated Refactoring of Legacy Java Software to Default Methods , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[39]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[40]  Todd C. Mowry,et al.  The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[41]  Coen De Roover,et al.  Automatic Parallelization of Side-Effecting Higher-Order Scheme Programs , 2011, 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation.

[42]  M. Wegman,et al.  Global value numbers and redundant computations , 1988, POPL '88.

[43]  Baishakhi Ray,et al.  An Empirical Study on the Use and Misuse of Java 8 Streams , 2020, FASE.

[44]  Hridesh Rajan,et al.  Panini: a concurrent programming model for solving pervasive and oblivious interference , 2015, MODULARITY.

[45]  Miryung Kim,et al.  An Empirical Study of RefactoringChallenges and Benefits at Microsoft , 2014, IEEE Transactions on Software Engineering.

[46]  Olin Shivers,et al.  Control-flow analysis of higher-order languages of taming lambda , 1991 .

[47]  Danny Dig,et al.  Type Migration in Ultra-Large-Scale Codebases , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[48]  Manu Sridharan,et al.  Translating imperative code to MapReduce , 2014, OOPSLA 2014.

[49]  Arun Lakhotia,et al.  Identifying enumeration types modeled with symbolic constants , 1996, Proceedings of WCRE '96: 4rd Working Conference on Reverse Engineering.

[50]  Danny Dig,et al.  Crossing the gap from imperative to functional programming through refactoring , 2013, ESEC/FSE 2013.

[51]  Danny Dig,et al.  Understanding the use of lambda expressions in Java , 2017, Proc. ACM Program. Lang..

[52]  Romain Rouvoy,et al.  On the Survival of Android Code Smells in the Wild , 2019, 2019 IEEE/ACM 6th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[53]  Marcin Paprzycki,et al.  Parallel computing works! , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[54]  Philip Wadler,et al.  Linear Types can Change the World! , 1990, Programming Concepts and Methods.

[55]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[56]  Hidehiko Masuhara,et al.  Proactive Empirical Assessment of New Language Feature Adoption via Automated Refactoring: The Case of Java 8 Default Methods , 2018, Art Sci. Eng. Program..

[57]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[58]  Manish Gupta,et al.  Automatic Parallelization of Recursive Procedures , 2004, International Journal of Parallel Programming.

[59]  Rudolf Eigenmann,et al.  Automatic program parallelization , 1993, Proc. IEEE.

[60]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[61]  Hridesh Rajan,et al.  Order types: static reasoning about message races in asynchronous message passing concurrency , 2017, AGERE!@SPLASH.

[62]  Leon Moonen,et al.  An Integrated Crosscutting Concern Migration Strategy and its Application to JHOTDRAW , 2007, Seventh IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2007).

[63]  Éric Tanter,et al.  Foundations of Typestate-Oriented Programming , 2014, ACM Trans. Program. Lang. Syst..

[64]  Michael D. Ernst,et al.  Refactoring sequential Java code for concurrency via concurrent libraries , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[65]  Rong Gu,et al.  SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters , 2014, J. Parallel Distributed Comput..

[66]  Gabriele Bavota,et al.  Do They Really Smell Bad? A Study on Developers' Perception of Bad Code Smells , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[67]  Eran Yahav,et al.  Typestate-based semantic code search over partial programs , 2012, OOPSLA '12.

[68]  Vivek Sarkar,et al.  Compiling and Optimizing Java 8 Programs for GPU Execution , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[69]  Manoj Kumar,et al.  Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications , 1988, IEEE Trans. Computers.

[70]  Stéphane Frénot,et al.  Transforming JavaScript event-loop into a pipeline , 2016, SAC.

[71]  Yu-Fang Chen,et al.  Commutativity of Reducers , 2015, TACAS.