The Computational Complexity of Densest Region Detection

We investigate the computational complexity of the task of detecting dense regions of an unknown distribution from unlabeled samples of this distribution. We introduce a formal learning model for this task that uses a hypothesis class as it “anti-overfitting” mechanism. The learning task in our model can be reduced to a combinatorial optimization problem. We can show that for some constants, depending on the hypothesis class, these problems are NP-hard to approximate to within these constant factors. We go on and introduce a new criterion for the success of approximate optimization geometric problems. The new criterion requires that the algorithm competes with hypotheses only on the points that are separated by some margin ? from their boundaries. Quite surprisingly, we discover that for each of the two hypothesis classes that we investigate, there is a “critical value” of the margin parameter ?. For any value below the critical value the problems are NP-hard to approximate, while, once this value is exceeded, the problems become poly-time solvable.