Exact expressions for structure selection in cluster expansions

The cluster expansion has proven to be a valuable tool in materials science to predict properties of configurationally ordered and disordered structures but the generation of cluster expansions can be computationally expensive. In recent years there have been efforts to make the generation of cluster expansions more efficient by selecting training structures in a way that minimizes approximate expressions for the variance of the predicted property values. We demonstrate that in many cases, these approximations are not necessary and exact expressions for the variance of the predicted property values may be derived. To illustrate this result, we present examples based on common applications of the cluster expansion such as bulk binary alloys. In addition we extend these structure selection techniques to Bayesian cluster expansions. These results should enable researchers to better analyze the quality of existing training sets and to select training structures that yield cluster expansions with lower prediction error.