论文信息 - Deep Data Flow Analysis

Deep Data Flow Analysis

Compiler architects increasingly look to machine learning when building heuristics for compiler optimization. The promise of automatic heuristic design, freeing the compiler engineer from the complex interactions of program, architecture, and other optimizations, is alluring. However, most machine learning methods cannot replicate even the simplest of the abstract interpretations of data flow analysis that are critical to making good optimization decisions. This must change for machine learning to become the dominant technology in compiler heuristics. To this end, we propose ProGraML - Program Graphs for Machine Learning - a language-independent, portable representation of whole-program semantics for deep learning. To benchmark current and future learning techniques for compiler analyses we introduce an open dataset of 461k Intermediate Representation (IR) files for LLVM, covering five source programming languages, and 15.4M corresponding data flow results. We formulate data flow analysis as an MPNN and show that, using ProGraML, standard analyses can be learned, yielding improved performance on downstream compiler optimization tasks.

[1] Jeffrey D. Ullman,et al. Global Data Flow Analysis and Iterative Algorithms , 1976, J. ACM.

[2] Jeronimo Castrillon,et al. Compiler-based graph representations for deep learning models of code , 2020, CC.

[3] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[4] Jeffrey D. Ullman,et al. Monotone data flow analysis frameworks , 1977, Acta Informatica.

[5] Max Welling,et al. Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[6] Christopher C. Cummins,et al. Synthesizing benchmarks for predictive modeling , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[7] Chris Cummins,et al. End-to-End Deep Learning of Optimization Heuristics , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8] Gary A. Kildall,et al. A unified approach to global program optimization , 1973, POPL.

[9] Charles A. Sutton,et al. Learning natural coding conventions , 2014, SIGSOFT FSE.

[10] Andreas Krause,et al. Predicting Program Properties from "Big Code" , 2015, POPL.

[11] Tao Wang,et al. Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[12] Gianluca Palermo,et al. A Survey on Compiler Autotuning using Machine Learning , 2018, ACM Comput. Surv..

[13] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[14] David Pichardie,et al. Validating Dominator Trees for a Fast, Verified Dominance Test , 2015, ITP.

[15] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17] Uri Alon,et al. code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[18] Torsten Hoefler,et al. Neural Code Comprehension: A Learnable Representation of Code Semantics , 2018, NeurIPS.

[19] Marc Brockschmidt,et al. Learning to Represent Programs with Graphs , 2017, ICLR.