Ultrafast Monte Carlo for Statistical Summations

Machine learning contains many computational bottlenecks in the form of nested summations over datasets. Computation of these summations is typically O(n) or higher, which severely limits application to large datasets. We present a multistage stratified Monte Carlo method for approximating such summations with probabilistic relative error control. The essential idea is fast approximation by sampling in trees. This method differs from many previous scalability techniques (such as multi-tree methods) in that its error is stochastic, but we derive conditions for error control and demonstrate that they work. Further, we give a theoretical sample complexity for the method that is independent of dataset size, and show that this appears to hold in experiments, where speedups reach as high as 10, many orders of magnitude beyond the previous state of the art.