Fashionable Modelling with Flux

Machine learning as a discipline has seen an incredible surge of interest in recent years due in large part to a perfect storm of new theory, superior tooling, renewed interest in its capabilities. We present in this paper a framework named Flux that shows how further refinement of the core ideas of machine learning, built upon the foundation of the Julia programming language, can yield an environment that is simple, easily modifiable, and performant. We detail the fundamental principles of Flux as a framework for differentiable programming, give examples of models that are implemented within Flux to display many of the language and framework-level features that contribute to its ease of use and high productivity, display internal compiler techniques used to enable the acceleration and performance that lies at the heart of Flux, and finally give an overview of the larger ecosystem that Flux fits inside of.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[3]  Peter Norvig,et al.  Deep Learning with Dynamic Computation Graphs , 2017, ICLR.

[4]  Barak A. Pearlmutter,et al.  Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator , 2008, TOPL.

[5]  Tim Besard,et al.  Effective Extensible Programming: Unleashing Julia on GPUs , 2017, IEEE Transactions on Parallel and Distributed Systems.

[6]  Alan Edelman,et al.  On Machine Learning and Programming Languages , 2018 .

[7]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[8]  Razvan Pascanu,et al.  Theano: Deep Learning on GPUs with Python , 2012 .

[9]  L. Dixon,et al.  Automatic differentiation of algorithms , 2000 .

[10]  Zoubin Ghahramani,et al.  Turing: A Language for Flexible Probabilistic Inference , 2018 .

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[13]  M. Pharr,et al.  ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).

[14]  Qing Nie,et al.  DifferentialEquations.jl – A Performant and Feature-Rich Ecosystem for Solving Differential Equations in Julia , 2017, Journal of Open Research Software.

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Elliot Saba Techniques for Cough Sound Analysis , 2018 .

[18]  Iain Dunning,et al.  JuMP: A Modeling Language for Mathematical Optimization , 2015, SIAM Rev..

[19]  Sebastian Hack,et al.  Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[20]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[21]  Keechul Jung,et al.  GPU implementation of neural networks , 2004, Pattern Recognit..

[22]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[23]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[24]  Miles Lubin,et al.  Forward-Mode Automatic Differentiation in Julia , 2016, ArXiv.

[25]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[26]  Graham Neubig,et al.  On-the-fly Operation Batching in Dynamic Computation Graphs , 2017, NIPS.

[27]  B. Speelpenning Compiling Fast Partial Derivatives of Functions Given by Algorithms , 1980 .