ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

The increasing complexity of computing systems places a tremendous burden on optimizing compilers, requiring ever more accurate and aggressive optimizations. Machine learning offers significant benefits for constructing optimization heuristics but there remains a gap between what state-of-the-art methods achieve and the performance of an optimal heuristic. Closing this gap requires improvements in two key areas: a representation that accurately captures the semantics of programs, and a model architecture with sufficient expressiveness to reason about this representation. We introduce ProGraML - Program Graphs for Machine Learning - a novel graph-based program representation using a low level, language agnostic, and portable format; and machine learning models capable of performing complex downstream tasks over these graphs. The ProGraML representation is a directed attributed multigraph that captures control, data, and call relations, and summarizes instruction and operand types and ordering. Message Passing Neural Networks propagate information through this structured representation, enabling whole-program or per-vertex classification tasks. ProGraML provides a general-purpose program representation that equips learnable models to perform the types of program analysis that are fundamental to optimization. To this end, we evaluate the performance of our approach first on a suite of traditional compiler analysis tasks: control flow reachability, dominator trees, data dependencies, variable liveness, and common subexpression detection. On a benchmark dataset of 250k LLVM-IR files covering six source programming languages, ProGraML achieves an average 94.0 F1 score, significantly outperforming the state-of-the-art approaches. We then apply our approach to two high-level tasks - heterogeneous device mapping and program classification - setting new state-of-the-art performance in both.

[1]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[2]  David Pichardie,et al.  Validating Dominator Trees for a Fast, Verified Dominance Test , 2015, ITP.

[3]  Torsten Hoefler,et al.  Demystifying Parallel and Distributed Deep Learning , 2018, ACM Comput. Surv..

[4]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[5]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[6]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[7]  Chris Cummins,et al.  End-to-End Deep Learning of Optimization Heuristics , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Ion Stoica,et al.  NeuroVectorizer: end-to-end vectorization with deep reinforcement learning , 2020, CGO.

[9]  John Cavazos,et al.  Using graph-based program characterization for predictive modeling , 2012, CGO '12.

[10]  Graham Neubig,et al.  Learning to Represent Edits , 2018, ICLR.

[11]  Torsten Hoefler,et al.  Neural Code Comprehension: A Learnable Representation of Code Semantics , 2018, NeurIPS.

[12]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[13]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[14]  Christopher Edward Cummins,et al.  Deep learning for compilers , 2020 .

[15]  Michael F. P. O'Boyle,et al.  Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[16]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[17]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[18]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[19]  Jeffrey D. Ullman,et al.  Monotone data flow analysis frameworks , 1977, Acta Informatica.

[20]  Samy Bengio,et al.  Device Placement Optimization with Reinforcement Learning , 2017, ICML.

[21]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[24]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[25]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[26]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[27]  Aditya K. Ghose,et al.  A deep tree-based model for software defect prediction , 2018, ArXiv.

[28]  Enrico Macii,et al.  Code Mapping in Heterogeneous Platforms Using Deep Learning and LLVM-IR , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[29]  Christian Bird,et al.  Deep learning type inference , 2018, ESEC/SIGSOFT FSE.

[30]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.

[31]  Gianluca Palermo,et al.  A Survey on Compiler Autotuning using Machine Learning , 2018, ACM Comput. Surv..

[32]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[33]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[34]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[35]  Ken Kennedy,et al.  Iterative Data-flow Analysis , Revisited , 2003 .

[36]  Marc Brockschmidt,et al.  Learning to Represent Programs with Graphs , 2017, ICLR.

[37]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[38]  Maunendra Sankar Desarkar,et al.  IR2Vec: LLVM IR based Scalable Program Embeddings , 2019 .

[39]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[40]  Zheng Wang,et al.  Machine Learning in Compiler Optimization , 2018, Proceedings of the IEEE.

[41]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[42]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[46]  Robert E. Tarjan,et al.  A fast algorithm for finding dominators in a flowgraph , 1979, TOPL.

[47]  Charles A. Sutton,et al.  Learning natural coding conventions , 2014, SIGSOFT FSE.

[48]  Charles A. Sutton,et al.  Mining source code repositories at massive scale using language modeling , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[49]  Jeronimo Castrillon,et al.  Compiler-based graph representations for deep learning models of code , 2020, CC.

[50]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[51]  Christopher C. Cummins,et al.  Synthesizing benchmarks for predictive modeling , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[52]  Maunendra Sankar Desarkar,et al.  IR2Vec: A Flow Analysis based Scalable Infrastructure for Program Encodings , 2019, ArXiv.

[53]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[54]  Uri Alon,et al.  A general path-based representation for predicting program properties , 2018, PLDI.

[55]  Jure Leskovec,et al.  Improving Graph Attention Networks with Large Margin-based Constraints , 2019, ArXiv.

[56]  Pushmeet Kohli,et al.  Graph Matching Networks for Learning the Similarity of Graph Structured Objects , 2019, ICML.

[57]  Martin Monperrus,et al.  A Literature Study of Embeddings on Source Code , 2019, ArXiv.