Machine learning in Python with no strings attached

Machine-learning frameworks in Python, such as scikit-learn, Keras, Spark, or Pyro, use embedded domain specific languages (EDSLs) to assemble a computational graph. Unfortunately, these EDSLs make heavy use of strings as names for computational graph nodes and other entities, leading to repetitive and hard-to-maintain code that does not benefit from standard Python tooling. This paper proposes eliminating strings where possible, reusing Python variable names instead. We demonstrate this on two examples from opposite ends of the design space: Keras.na, a light-weight wrapper around the Keras library, and , a new embedding of Stan into Python. Our techniques do not require modifications to the underlying library. Avoiding strings removes redundancy, simplifies maintenance, and enables Python tooling to better reason about the code and assist users.

[1]  Tianqi Chen,et al.  Relay: a new IR for machine learning frameworks , 2018, MAPL@PLDI.

[2]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[3]  Paul Hudak,et al.  Modular domain specific languages and tools , 1998, Proceedings. Fifth International Conference on Software Reuse (Cat. No.98TB100203).

[4]  Thomas A. Henzinger,et al.  Probabilistic programming , 2014, FOSE.

[5]  Gilles Louppe,et al.  Independent consultant , 2013 .

[6]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[7]  Bart van Merriënboer,et al.  Automatic Differentiation in Myia , 2017 .

[8]  Sam Tobin-Hochstadt,et al.  Languages as libraries , 2011, PLDI '11.

[9]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[10]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[11]  François Chollet,et al.  Keras: The Python Deep Learning library , 2018 .

[12]  Martin Odersky,et al.  Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.

[13]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[14]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[15]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[16]  Dustin Tran,et al.  Deep Probabilistic Programming , 2017, ICLR.

[17]  Alan F. Blackwell,et al.  A Live, Multiple-Representation Probabilistic Programming Environment for Novices , 2016, CHI.