Exploration of neural machine translation in autoformalization of mathematics in Mizar

In this paper we share several experiments trying to automatically translate informal mathematics into formal mathematics. In our context informal mathematics refers to human-written mathematical sentences in the LaTeX format; and formal mathematics refers to statements in the Mizar language. We conducted our experiments against three established neural network-based machine translation models that are known to deliver competitive results on translating between natural languages. To train these models we also prepared four informal-to-formal datasets. We compare and analyze our results according to whether the model is supervised or unsupervised. In order to augment the data available for auto-formalization and improve the results, we develop a custom type-elaboration mechanism and integrate it in the supervised translation.

[1]  A. Trybulec Tarski Grothendieck Set Theory , 1990 .

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[4]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[5]  Chad E. Brown,et al.  Satallax: An Automatic Higher-Order Prover , 2012, IJCAR.

[6]  Tobias Nipkow,et al.  A FORMAL PROOF OF THE KEPLER CONJECTURE , 2015, Forum of Mathematics, Pi.

[7]  Josef Urban MPTP – Motivation, Implementation, First Experiments , 2004, Journal of Automated Reasoning.

[8]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[9]  Ohad Shamir,et al.  Depth Separation in ReLU Networks for Approximating Smooth Non-Linear Functions , 2016, ArXiv.

[10]  Grzegorz Bancerek Automatic translation in Formalized Mathematics , 2006 .

[11]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[12]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13]  Robert S. Boyer,et al.  The QED Manifesto , 1994, CADE.

[14]  Alonzo Church,et al.  A formulation of the simple theory of types , 1940, Journal of Symbolic Logic.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[17]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[18]  Geoff Sutcliffe,et al.  The TPTP Problem Library , 1994, Journal of Automated Reasoning.

[19]  Matus Telgarsky,et al.  Representation Benefits of Deep Feedforward Networks , 2015, ArXiv.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Josef Urban,et al.  XML-izing Mizar: Making Semantic Processing and Presentation of MML Easy , 2005, MKM.

[22]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[23]  Peter Koepke,et al.  The Naproche Project: Controlled Natural Language Proof Checking of Mathematical Texts , 2009, CNL.

[24]  Josef Urban,et al.  Extracting Higher-Order Goals from the Mizar Mathematical Library , 2016, CICM.

[25]  Cezary Kaliszyk,et al.  Learning to Parse on Aligned Corpora (Rough Diamond) , 2015, ITP.

[26]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[27]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[28]  Hao Wang,et al.  The Formalization of Mathematics , 1954, J. Symb. Log..

[29]  Tobias Nipkow,et al.  Mining the Archive of Formal Proofs , 2015, CICM.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[32]  Josef Urban,et al.  MPTP 0.2: Design, Implementation, and Initial Experiments , 2006, Journal of Automated Reasoning.

[33]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[34]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[35]  Josef Urban,et al.  Momm - Fast Interreduction and Retrieval in Large Libraries of Formalized Mathematics , 2006, Int. J. Artif. Intell. Tools.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Adam Naumowicz,et al.  System Description: XSL-Based Translator of Mizar to LaTeX , 2018, CICM.

[38]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[39]  Qingxiang Wang,et al.  First Experiments with Neural Translation of Informal to Formal Mathematics , 2018, CICM.

[40]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[41]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[42]  Claus Zinn,et al.  Understanding informal mathematical discourse , 2004 .

[43]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[44]  Cezary Kaliszyk,et al.  Automating Formalization by Statistical and Semantic Parsing of Mathematics , 2017, ITP.

[45]  Josef Urban,et al.  Guiding Theorem Proving by Recurrent Neural Networks , 2019, ArXiv.

[46]  Adam Naumowicz,et al.  The Role of the Mizar Mathematical Library for Interactive Proof Development in Mizar , 2017, Journal of Automated Reasoning.

[47]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[48]  M. Gordon,et al.  Introduction to HOL: a theorem proving environment for higher order logic , 1993 .