REDUCING THE DIMENSIONALITY OF DATA: LOCALLY LINEAR EMBEDDING OF SLOAN GALAXY SPECTRA

We introduce locally linear embedding (LLE) to the astronomical community as a new classification technique, using Sloan Digital Sky Survey spectra as an example data set. LLE is a nonlinear dimensionality reduction technique that has been studied in the context of computer perception. We compare the performance of LLE to well-known spectral classification techniques, e.g., principal component analysis and line-ratio diagnostics. We find that LLE combines the strengths of both methods in a single, coherent technique, and leads to improved classification of emission-line spectra at a relatively small computational cost. We also present a data subsampling technique that preserves local information content, and proves effective for creating small, efficient training samples from large, high-dimensional data sets. Software used in this LLE-based classification is made available.

[1]  G. Vaucouleurs Classification and Morphology of External Galaxies , 1959 .

[2]  A. Naim,et al.  Neural computation as a tool for galaxy classification: methods and examples , 1995, astro-ph/9508012.

[3]  B. Skiff,et al.  VizieR Online Data Catalog , 2009 .

[4]  M. Skrutskie,et al.  The Two Micron All Sky Survey (2MASS) , 2006 .

[5]  I. Jolliffe Principal Component Analysis , 2002 .

[6]  University of Toronto,et al.  A New Approach to Galaxy Morphology. I. Analysis of the Sloan Digital Sky Survey Early Data Release , 2003, astro-ph/0301239.

[7]  Harinder P. Singh,et al.  Stellar spectral classification using principal component analysis and artificial neural networks , 1998 .

[8]  Sabine Van Huffel,et al.  Total least squares problem - computational aspects and analysis , 1991, Frontiers in applied mathematics.

[9]  A. Szalay,et al.  Reliable eigenspectra for new generation surveys , 2008, 0809.0881.

[10]  William H. Press,et al.  Spectral Classification and Luminosity Function of Galaxies in the Las Campanas Redshift Survey , 1997, astro-ph/9711227.

[11]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[12]  Alexei V. Filippenko,et al.  A Search for “Dwarf'' Seyfert Nuclei. III. Spectroscopic Parameters and Properties of the Host Galaxies , 1997, astro-ph/9704107.

[13]  Christian Reichardt,et al.  Recovering physical parameters from galaxy spectra using MOPED , 2001, astro-ph/0101074.

[14]  Global regularities in integrated galaxy spectra , 1996, astro-ph/9612161.

[15]  R. Nichol,et al.  Distributions of Galaxy Spectral Types in the Sloan Digital Sky Survey , 2004, astro-ph/0407061.

[16]  Puragra Guhathakurta,et al.  The DEEP2 Galaxy Redshift Survey: Spectral Classification of Galaxies at z ∼ 1 , 2003, astro-ph/0305587.

[17]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  D. Osterbrock,et al.  OPTICAL SPECTRA OF IRAS "WARM" GALAXIES. , 1985 .

[19]  S. Bergh A new classification system for galaxies. , 1976 .

[20]  O. Lahav,et al.  Massive lossless data compression and multiple parameter estimation from galaxy spectra , 1999, astro-ph/9911102.

[21]  Frederic H. Chaffee,et al.  An objective classification scheme for QSO spectra , 1992 .

[22]  Lloyd N. Trefethen,et al.  Large-Scale Computation of Pseudospectra Using ARPACK and Eigs , 2001, SIAM J. Sci. Comput..

[23]  Robert Jedicke,et al.  Pan-STARRS: A Large Synoptic Survey Telescope Array , 2002, SPIE Astronomical Telescopes + Instrumentation.

[24]  J. Baldwin,et al.  ERRATUM - CLASSIFICATION PARAMETERS FOR THE EMISSION-LINE SPECTRA OF EXTRAGALACTIC OBJECTS , 1981 .

[25]  D. Madgwick,et al.  Spectroscopic Detection of Type Ia Supernovae in the Sloan Digital Sky Survey , 2003, astro-ph/0310887.

[26]  Dit-Yan Yeung,et al.  Robust locally linear embedding , 2006, Pattern Recognit..

[27]  Francisco Valdes,et al.  The Morphologies of Distant Galaxies. I. an Automated Classification System , 1994 .

[28]  E. al.,et al.  The Sloan Digital Sky Survey: Technical summary , 2000, astro-ph/0006396.

[29]  Galaxy spectral parametrization in the 2dF Galaxy Redshift Survey as a diagnostic of star formation history , 2002, astro-ph/0210471.

[30]  P. Holland,et al.  Robust regression using iteratively reweighted least-squares , 1977 .

[31]  K. Abazajian,et al.  THE SEVENTH DATA RELEASE OF THE SLOAN DIGITAL SKY SURVEY , 2008, 0812.0649.

[32]  Robert L. Kurucz,et al.  A calibration of Geneva photometry for B to G stars in terms of Teff, log g and $[M/H]$ , 1997 .

[33]  Multivariate analysis of elliptical galaxies , 1984 .

[34]  O. Lahav,et al.  Principal component analysis of synthetic galaxy spectra , 1998, astro-ph/9805130.

[35]  O. Lahav,et al.  An artificial neural network approach to the classification of galaxy spectra , 1996, astro-ph/9608073.

[36]  A. Szalay,et al.  Spectral Classification of Quasars in the Sloan Digital Sky Survey: Eigenspectra, Redshift, and Luminosity Effects , 2004, astro-ph/0408578.

[37]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[38]  A. Szalay,et al.  Spectral classification of galaxies: An Orthogonal approach , 1994, astro-ph/9411044.

[39]  L. Kewley,et al.  Theoretical Modeling of Starburst Galaxies , 2001, astro-ph/0106324.

[40]  Ted von Hippel,et al.  Automated classification of stellar spectra - II. Two-dimensional classification with neural networks and principal components analysis , 1998, astro-ph/9803050.

[41]  M. Fukugita,et al.  The Sloan Digital Sky Survey Photometric System , 1996 .

[42]  V. Narayanan,et al.  Spectroscopic Target Selection in the Sloan Digital Sky Survey: The Main Galaxy Sample , 2002, astro-ph/0206225.

[43]  STAR FORMATION IN GALAXIES ALONG THE HUBBLE SEQUENCE , 1998, astro-ph/9807187.

[44]  T. Boroson,et al.  The Emission-Line Properties of Low-Redshift Quasi-stellar Objects , 1992 .

[45]  et al,et al.  The Sloan Digital Sky Survey Photometric Camera , 1998, astro-ph/9809085.

[46]  K. Taylor,et al.  The 2dF Galaxy Redshift Survey: spectral types and luminosity functions , 1999, astro-ph/9903456.

[47]  A. Pasquali,et al.  A Principal Component Analysis approach to the Star Formation History of elliptical galaxies in Compact Groups , 2005, astro-ph/0511753.

[48]  Hans-Peter Kriegel,et al.  Fast nearest neighbor search in high-dimensional space , 1998, Proceedings 14th International Conference on Data Engineering.

[49]  G. Bruzual,et al.  Stellar population synthesis at the resolution of 2003 , 2003, astro-ph/0309134.

[50]  Ann B. Lee,et al.  EXPLOITING LOW-DIMENSIONAL STRUCTURE IN ASTRONOMICAL SPECTRA , 2008, 0807.2900.

[51]  C. Pritchet The Stellar Content of Local Group Galaxies: IAU Symposium 192 , 1998 .

[52]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .