A One-Pass Space-Efficient Algorithm for Finding Quantiles

We present an algorithm for nding the quantile values of a large unordered dataset with unknown distribution. The algorithm has the following features: i) it requires only one pass over the data; ii) it is space e cient | it uses a small bounded amount of memory independent of the number of values in the dataset; and iii) the true quantile is guaranteed to lie within the lower and upper bounds produced by the algorithm. Empirical evaluation using synthetic data with various distributions as well as real data show that the bounds obtained are quite tight. The algorithm has several applications in database systems, for example in database governors, query optimization, load balancing in multiprocessor database systems, and data mining.