Agnostic Clustering

Motivated by the principle of agnostic learning, we present an extension of the model introduced by Balcan, Blum, and Gupta [3] on computing low-error clusterings. The extended model uses a weaker assumption on the target clustering, which captures data clustering in presence of outliers or ill-behaved data points. Unlike the original target clustering property, with our new property it may no longer be the case that all plausible target clusterings are close to each other. Instead, we present algorithms that produce a small list of clusterings with the guarantee that all clusterings satisfying the assumption are close to some clustering in the list, proving both upper and lower bounds on the length of the list needed.