Statistical calculations at scale using machine learning algorithms and emulation
暂无分享,去创建一个
As the size of datasets grows, the majority of interesting statistical models become inaccessible because they scale quadratically or worse. For some problems fast algorithms exist that converge to the desired model of interest. But this is rarely true: we often really want to use a complex model. Beyond simply discarding data, how can the model be made to run, and efficiently use the information available? We describe a decision framework for computation in which computationally cheap approximate models can be substituted for part of the model of interest. By exploiting the correlation between the inexpensive approximate model and the full model, far more efficient calculations are possible.
[1] Emmanuel J. Candès,et al. Matrix Completion With Noise , 2009, Proceedings of the IEEE.
[2] Daniel John Lawson,et al. A general decision framework for structuring computation using Data Directional Scaling to process massive similarity matrices , 2014, 1403.4054.
[3] D. Falush,et al. Inference of Population Structure using Dense Haplotype Data , 2012, PLoS genetics.