Equality Saturation for Tensor Graph Superoptimization

One of the major optimizations employed in deep learning frameworks is graph rewriting. Production frameworks rely on heuristics to decide if rewrite rules should be applied and in which order. Prior research has shown that one can discover more optimal tensor computation graphs if we search for a better sequence of substitutions instead of relying on heuristics. However, we observe that existing approaches for tensor graph superoptimization both in production and research frameworks apply substitutions in a sequential manner. Such sequential search methods are sensitive to the order in which the substitutions are applied and often only explore a small fragment of the exponential space of equivalent graphs. This paper presents a novel technique for tensor graph superoptimization that employs equality saturation to apply all possible substitutions at once. We show that our approach can find optimized graphs with up to 16% speedup over state-of-the-art, while spending on average 48x less time optimizing.

[1]  Dinakar Dhurjati,et al.  GreenThumb: superoptimizer construction framework , 2016, CC.

[2]  Matei Zaharia,et al.  Optimizing DNN Computation with Relaxed Graph Substitutions , 2019, MLSys.

[3]  Charles Gregory Nelson,et al.  Techniques for program verification , 1979 .

[4]  Alexander Aiken,et al.  Automatic generation of peephole superoptimizers , 2006, ASPLOS XII.

[5]  Pavel Panchekha,et al.  Automatically improving accuracy for floating point expressions , 2015, PLDI.

[6]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yuandong Tian,et al.  Learning to Perform Local Rewriting for Combinatorial Optimization , 2019, NeurIPS.

[8]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  H. Massalin Superoptimizer: a look at the smallest program , 1987, ASPLOS.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  James R. Wilcox,et al.  Synthesizing structured CAD models with equality saturation and inverse transformations , 2019, PLDI.

[12]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[13]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[14]  David Detlefs,et al.  Simplify: a theorem prover for program checking , 2005, JACM.

[15]  Michael Stepp,et al.  Equality saturation: a new approach to optimization , 2009, POPL '09.

[16]  Alexander Aiken,et al.  Stochastic superoptimization , 2012, ASPLOS '13.

[17]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Alexander Aiken,et al.  Stochastic optimization of floating-point programs with tunable precision , 2014, PLDI.

[19]  Weihong Deng,et al.  Very deep convolutional neural network based image classification using small training sample size , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[20]  Armando Solar-Lezama,et al.  Semantic code search via equational reasoning , 2020, PLDI.

[21]  Dinakar Dhurjati,et al.  Scaling up Superoptimization , 2016, ASPLOS.

[22]  Alexander Aiken,et al.  TASO: optimizing deep learning computation with automatic generation of graph substitutions , 2019, SOSP.

[23]  Yue Wang,et al.  Optimizing DNN computation graph using graph substitutions , 2020, Proc. VLDB Endow..

[24]  Nikolaj Bjørner,et al.  Efficient E-Matching for SMT Solvers , 2007, CADE.

[25]  Dan Suciu,et al.  SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra , 2020, Proc. VLDB Endow..

[26]  Sorin Lerner,et al.  Equality-Based Translation Validator for LLVM , 2011, CAV.

[27]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[28]  John Regehr,et al.  Souper: A Synthesizing Superoptimizer , 2017, ArXiv.

[29]  Benjamin Müller,et al.  The SCIP Optimization Suite 5.0 , 2017, 2112.08872.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.