Photometric Redshift Estimation on SDSS Data Using Random Forests

Given multiband photometric data from the SDSS DR6, we esti- mate galaxy redshifts. We employ a Random Forest trained on color features and spectroscopic redshifts from 80,000 randomly chosen primary galaxies yielding a mapping from color to redshift such that the difference between the estimate and the spectroscopic redshift is small. Our methodology results in tight RMS scat- ter in the estimates limited by photometric errors. Additionally, this approach yields an error distribution that is nearly Gaussian with parameter estimates giving reliable confidence intervals unique to each galaxy photometric redshift. 1. The Problem We are given five bands of photometric data from the Sloan Digital Sky Survey Data Release 6 (SDSS DR6) 1 . Associated with each magnitude measurement is an error measurement and an extinction measurement used to correct the effect of Galactic dust. For some objects we have spectroscopic redshifts, hereon denoted zspec for a particular object. We wish to estimate photometric redshifts for non-spectroscopic objects which are represented by the spectroscopic sample. We are currently interested only in galaxies with 10 4 � zspec � 1, and we exclude other objects which are likely to "contaminate" our sample. We further choose for the moment not to consider Luminous Red Galaxies (LRGs), leaving us with some 400,000 objects with which to train and test. In our current work we use magnitudes available in the new Ubercal table (Padmanabhan 2007) for each object. We subtract sequential extinction-corrected magnitudes to get color features, named u, g, r, and i. For instance, our color u is actually the magnitude difference u g and so on.