OptTyper: Probabilistic Type Inference by Optimising Logical and Natural Constraints

We present a new approach to the type inference problem for dynamic languages. Our goal is to combine logical constraints, that is, deterministic information from a type system, with natural constraints, that is, uncertain statistical information about types learnt from sources like identifier names. To this end, we introduce a framework for probabilistic type inference that combines logic and learning: logical constraints on the types are extracted from the program, and deep learning is applied to predict types from surface-level code properties that are statistically associated, such as variable names. The foremost insight of our method is to constrain the predictions from the learning procedure to respect the logical constraints, which we achieve by relaxing the logical inference problem of type prediction into a continuous optimisation problem. As proof of concept, we build a tool called OptTyper to predict missing types for TypeScript files. OptTyper combines a continuous interpretation of logical constraints derived by a simple program transformation and static analysis of TypeScript code, with natural constraints obtained from a deep learning model, which learns naming conventions for types from a large codebase. By evaluating OptTyper, we show that the combination of logical and natural constraints yields a large improvement in performance over either kind of information individually and achieves a 3% improvement over the state-of-the-art.

[1]  Zheng Gao,et al.  Typilus: neural type hints , 2020, PLDI.

[2]  Anders Møller,et al.  Inference and Evolution of TypeScript Declaration Files , 2017, FASE.

[3]  Petr Hájek,et al.  Metamathematics of Fuzzy Logic , 1998, Trends in Logic.

[4]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Peter Thiemann,et al.  Type Analysis for JavaScript , 2009, SAS.

[7]  Leo A. Meyerovich,et al.  Socio-PLT: principles for programming language adoption , 2012, Onward! 2012.

[8]  Martin White,et al.  Toward Deep Learning Software Repositories , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[9]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[10]  Marc Brockschmidt,et al.  Learning to Represent Programs with Graphs , 2017, ICLR.

[11]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[12]  Anders Møller,et al.  Type test scripts for TypeScript testing , 2017, Proc. ACM Program. Lang..

[13]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[14]  Robin Milner,et al.  A Theory of Type Polymorphism in Programming , 1978, J. Comput. Syst. Sci..

[15]  Martín Abadi,et al.  Understanding TypeScript , 2014, ECOOP.

[16]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[17]  Martin T. Vechev,et al.  Scalable taint specification inference with big code , 2019, PLDI.

[18]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[19]  Philip Wadler,et al.  Mixed Messages: Measuring Conformance and Non-Interference in TypeScript (Artifact) , 2017, Dagstuhl Artifacts Ser..

[20]  Andrew D. Gordon,et al.  Semantic subtyping with an SMT solver , 2010, ICFP '10.

[21]  Anders Møller,et al.  Checking correctness of TypeScript interfaces for JavaScript libraries , 2014, OOPSLA.

[22]  Petr Hájek,et al.  A complete many-valued logic with product-conjunction , 1996, Arch. Math. Log..

[23]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[24]  GetoorLise,et al.  Hinge-loss Markov random fields and probabilistic soft logic , 2017 .

[25]  Truyen Tran,et al.  A deep language model for software code , 2016, FSE 2016.

[26]  Sebastian Kleinschmager,et al.  An empirical study on the impact of static typing on software maintainability , 2013, Empirical Software Engineering.

[27]  Isil Dillig,et al.  LambdaNet: Probabilistic Type Inference using Graph Neural Networks , 2020, ICLR.

[28]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[29]  Baowen Xu,et al.  Python probabilistic type inference with natural language support , 2016, SIGSOFT FSE.

[30]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[31]  Lise Getoor,et al.  A short introduction to probabilistic soft logic , 2012, NIPS 2012.

[32]  Zheng Gao,et al.  To Type or Not to Type: Quantifying Detectable Bugs in JavaScript , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[33]  Michael Pradel,et al.  NL2Type: Inferring JavaScript Function Types from Natural Language Information , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[34]  Armando Solar-Lezama,et al.  The Three Pillars of Machine-Based Programming , 2018, ArXiv.

[35]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.

[36]  Walid Taha,et al.  Gradual Typing for Objects , 2007, ECOOP.

[37]  Jeremy G. Siek Gradual Typing for Functional Languages , 2006 .

[38]  Patrick Maxim Rondon,et al.  Liquid types , 2008, PLDI '08.

[39]  Sameer Singh,et al.  Injecting Logical Background Knowledge into Embeddings for Relation Extraction , 2015, NAACL.

[40]  Eran Yahav,et al.  Programming with "Big Code" , 2015, Found. Trends Program. Lang..

[41]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[42]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[43]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[44]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[45]  Martin Odersky,et al.  Type Inference with Constrained Types , 1999, Theory Pract. Object Syst..

[46]  Christian Bird,et al.  Deep learning type inference , 2018, ESEC/SIGSOFT FSE.

[47]  Ravi Chugh,et al.  Dependent types for JavaScript , 2012, OOPSLA '12.

[48]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[49]  Ranjit Jhala,et al.  Refinement types for Haskell , 2014, ICFP.

[50]  Gilad Bracha Pluggable Type Systems , 2004 .

[51]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[52]  Richard Evans,et al.  Learning Explanatory Rules from Noisy Data (Extended Abstract) , 2018, IJCAI.

[53]  H. Robbins A Stochastic Approximation Method , 1951 .

[54]  Ville Tirronen,et al.  Understanding beginners' mistakes with Haskell , 2015, Journal of Functional Programming.

[55]  Armando Solar-Lezama,et al.  The three pillars of machine programming , 2018, MAPL@PLDI.