Skip to yearly menu bar Skip to main content


Poster

Multi-Stage Monte Carlo Approximation for Fast Generalized Data Summations

Michael Holmes · Alexander Gray · Charles Isbell


Abstract: In machine learning we often encounter computational bottlenecks in the form of generalized summations over datasets. When the computational cost of these methods is $O(n^2)$ or higher, applicability to large datasets is severely limited. We present a multi-stage Monte Carlo method for approximating such summations with probabilistic relative error control. This method differs from many previous scalability techniques (such as tree-based methods) in that its error is stochastic, but we derive conditions for error control and demonstrate that they work. Further, we give a theoretical sample complexity for the method that is independent of dataset size, and show that this appears to hold in experiments, where speedups reach as high as $10^{14}$ on datasets with points numbering in the millions.

Live content is unavailable. Log in and register to view live content