Star Discrepancy Subset Selection: Problem Formulation and Efficient Approaches for Low Dimensions

Motivated by applications in instance selection, we introduce the star discrepancy subset selection problem, which consists of finding a subset of m out of n points that minimizes the star discrepancy. First, we show that this problem is NP-hard. Then, we introduce a mixed integer linear formulation (MILP) and a combinatorial branch-and-bound (BB) algorithm for the star discrepancy subset selection problem and we evaluate both approaches against random subset selection and a greedy construction on different use-cases in dimension two and three. Our results show that the MILP and BB are efficient in dimension two for large and small m/n ratio, respectively, and for not too large n. However, the performance of both approaches decays strongly for larger dimensions and set sizes. As a side effect of our empirical comparisons we obtain point sets of discrepancy values that are much smaller than those of common low-discrepancy sequences, random point sets, and of Latin Hypercube Sampling. This suggests that subset selection could be an interesting approach for generating point sets of small discrepancy value.

[1]  Y. Zhu,et al.  A method for exact calculation of the discrepancy of low-dimensional finite point sets I , 1993 .

[2]  R. Cools,et al.  Good permutations for deterministic scrambled Halton sequences in terms of L2-discrepancy , 2006 .

[3]  H. Faure Discrépance de suites associées à un système de numération (en dimension s) , 1982 .

[4]  Aicke Hinrichs,et al.  Covering numbers, Vapnik-ervonenkis classes and bounds for the star-discrepancy , 2004, J. Complex..

[5]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[6]  David Eppstein,et al.  Computing the discrepancy with applications to supersampling patterns , 1996, TOGS.

[7]  Carola Doerr,et al.  Calculation of Discrepancy Measures and Applications , 2014, 1405.1653.

[8]  Anand Srivastav,et al.  Finding optimal volume subintervals with k points and calculating the star discrepancy are NP-hard problems , 2009, J. Complex..

[9]  Markus Hofer,et al.  Probabilistic discrepancy bound for Monte Carlo point sets , 2014, Math. Comput..

[10]  J. Hammersley MONTE CARLO METHODS FOR SOLVING MULTIVARIABLE PROBLEMS , 1960 .

[11]  Frances Y. Kuo,et al.  Constructing Sobol Sequences with Better Two-Dimensional Projections , 2008, SIAM J. Sci. Comput..

[12]  Harald Niederreiter,et al.  Discrepancy and convex programming , 1972 .

[13]  M. Lacey,et al.  On the Small Ball Inequality in All Dimensions , 2007, 0705.4619.

[14]  Rui Yu,et al.  Fibonacci sets and symmetrization in discrepancy theory , 2012, J. Complex..

[15]  Brian E. White,et al.  On optimal extreme-discrepancy point sets in the square , 1977 .

[16]  Michael Gnewuch,et al.  Probabilistic Lower Bounds for the Discrepancy of Latin Hypercube Samples , 2018 .

[17]  Harald Niederreiter,et al.  Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[18]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[19]  Carola Doerr,et al.  Constructing low star discrepancy point sets with genetic algorithms , 2013, GECCO '13.

[20]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[21]  Markus Wagner,et al.  Discrepancy-based evolutionary diversity optimization , 2018, GECCO.

[22]  Magnus Wahlström,et al.  Hardness of discrepancy computation and ε-net verification in high dimension , 2012, J. Complex..

[23]  Raaz Dwivedi,et al.  The power of online thinning in reducing discrepancy , 2016, Probability Theory and Related Fields.

[24]  Gonçalo Nuno Corte-Real Martins Algorithms for the star discrepancy subset selection problem , 2019 .

[25]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[26]  Magnus Wahlström,et al.  A New Randomized Algorithm to Approximate the Star Discrepancy Based on Threshold Accepting , 2012, SIAM J. Numer. Anal..

[27]  Michael Gnewuch,et al.  Discrepancy Bounds for a Class of Negatively Dependent Random Points Including Latin Hypercube Samples , 2021, The Annals of Applied Probability.

[28]  Silvio Galanti,et al.  Low-Discrepancy Sequences , 1997 .

[29]  E. Braaten,et al.  An Improved Low-Discrepancy Sequence for Multidimensional Quasi-Monte Carlo Integration , 1979 .

[30]  W. Schmidt On irregularities of distribution vii , 1972 .

[31]  I. Sobol On the distribution of points in a cube and the approximate evaluation of integrals , 1967 .

[32]  O. Teytaud,et al.  Fully Parallel Hyperparameter Search: Reshaped Space-Filling , 2020, ICML.

[33]  Ramana V. Grandhi,et al.  Improved Distributed Hypercube Sampling , 2002 .

[34]  Benjamin Doerr,et al.  A lower bound for the discrepancy of a random point set , 2012, J. Complex..

[35]  E. Novak,et al.  The inverse of the star-discrepancy depends linearly on the dimension , 2001 .