Statistical calculations at scale using machine learning algorithms and emulation

As the size of datasets grows, the majority of interesting statistical models become inaccessible because they scale quadratically or worse. For some problems fast algorithms exist that converge to the desired model of interest. But this is rarely true: we often really want to use a complex model. Beyond simply discarding data, how can the model be made to run, and efficiently use the information available? We describe a decision framework for computation in which computationally cheap approximate models can be substituted for part of the model of interest. By exploiting the correlation between the inexpensive approximate model and the full model, far more efficient calculations are possible.