Learning Filaments

This paper is about new statistics and new ef-cient algorithms for a form of mixture model that learns lamentary structures. Such models are important in several areas of sci-entiic data analysis, but in this paper our main example is identiication of large-scale structure among galaxies. We describe software which can extract the positions of spherical and line-shaped clusters from data about the locations of objects such as galaxies. We do so by tting a particular type of Gaussian mixture model to the galaxy locations. The most interesting feature of our model is that it directly represents line segments in the distribution , unlike standard Gaussian mixture models which can only handle ellipses. Because we t the line segments directly, we do not need to do any post-processing to extract their locations. We use a modiication of the k-means algorithm to nd model parameters. Since our software needs to deal with large data sets, it is important to accelerate model-tting as much as possible. So, we store the galaxy locations in a multi-resolution kd-tree, and we introduce new pruning algorithms that allow us to skip over large parts of the tree in each k-means step. We provide evaluations on both synthetic and real data sets.