Fast nonlinear regression via eigenimages applied to galactic morphology

Astronomy increasingly faces the issue of massive, unwieldly data sets. The Sloan Digital Sky Survey (SDSS) [11] has so far generated tens of millions of images of distant galaxies, of which only a tiny fraction have been morphologically classified. Morphological classification in this context is achieved by fitting a parametric model of galaxy shape to a galaxy image. This is a nonlinear regression problem, whose challenges are threefold, 1) blurring of the image caused by atmosphere and mirror imperfections, 2) large numbers of local minima, and 3) massive data sets.Our strategy is to use the eigenimages of the parametric model to form a new feature space, and then to map both target image and the model parameters into this feature space. In this low-dimensional space we search for the best image-to-parameter match. To search the space, we sample it by creating a database of many random parameter vectors (prototypes) and mapping them into the feature space. The search problem then becomes one of finding the best prototype match, so the fitting process a nearest-neighbor search.In addition to the savings realized by decomposing the original space into an eigenspace, we can use the fact that the model is a linear sum of functions to reduce the prototypes further: the only prototypes stored are the components of the model function. A modified form of nearest neighbor is used to search among them.Additional complications arise in the form of missing data and heteroscedasticity, both of which are addressed with weighted linear regression. Compared to existing techniques, speed-ups ach-ieved are between 2 and 3 orders of magnitude. This should enable the analysis of the entire SDSS dataset.