Smoothing dissimilarities to cluster binary data

Cluster analysis attempts to group data objects into homogeneous clusters on the basis of the pairwise dissimilarities among the objects. When the data contain noise, we might consider performing a smoothing operation, either on the data themselves or on the dissimilarities, before implementing the clustering algorithm. Possible benefits to such pre-smoothing are discussed in the context of binary data. We suggest a method for cluster analysis of binary data based on ''smoothed'' dissimilarities. The smoothing method presented borrows ideas from shrinkage estimation of cell probabilities. Some simulation results are given showing that improvement in the accuracy of the clustering result is obtained via smoothing, especially in the case in which the observed data contain substantial noise. The method is illustrated with an example involving binary test item response data.