Machine learning contains many computational bottlenecks in the form of nested summations over datasets. Computation of these summations is typically O(n) or higher, which severely limits application to large datasets. We present a multistage stratified Monte Carlo method for approximating such summations with probabilistic relative error control. The essential idea is fast approximation by sampling in trees. This method differs from many previous scalability techniques (such as multi-tree methods) in that its error is stochastic, but we derive conditions for error control and demonstrate that they work. Further, we give a theoretical sample complexity for the method that is independent of dataset size, and show that this appears to hold in experiments, where speedups reach as high as 10, many orders of magnitude beyond the previous state of the art.
[1]
Nicol N. Schraudolph,et al.
Combining Conjugate Direction Methods with Stochastic Approximation of Gradients
,
2003,
AISTATS.
[2]
Alexander G. Gray,et al.
Fast Nonparametric Conditional Density Estimation
,
2007,
UAI.
[3]
Paul Glasserman,et al.
Monte Carlo Methods in Financial Engineering
,
2003
.
[4]
Andrew W. Moore,et al.
'N-Body' Problems in Statistical Learning
,
2000,
NIPS.
[5]
Nando de Freitas,et al.
Fast particle smoothing: if I had a million particles
,
2006,
ICML.
[6]
Alexander G. Gray,et al.
Fast Mean Shift with Accurate and Stable Convergence
,
2007,
AISTATS.
[7]
J. Hammersley.
SIMULATION AND THE MONTE CARLO METHOD
,
1982
.