Optimal shapes for kernel density estimation

In the early years of kernel density estimation, Watson and Lead-better (1963) attempted to optimize kernel shape for fixed sample sizes by minimizing the expected L 2 distance between the kernel density estimate and the true density. Perhaps out of technical necessity, they did not impose the constraint that the kernel be a probability density function. The present paper uses recent developments in the theory of infinite programming to successfully impose that constraint. Necessary and sufficient conditions for solution of the constrained problem are derived. These conditions are not trivial; however, they can be exploited to demonstrate that symmetric densities with sufficiently light tails have optimal kernels with compact support.