AUTOMATIC UNSUPERVISED CLASSIFICATION OF ALL SLOAN DIGITAL SKY SURVEY DATA RELEASE 7 GALAXY SPECTRA

Using the k-means cluster analysis algorithm, we carry out an unsupervised classification of all galaxy spectra in the seventh and final Sloan Digital Sky Survey data release (SDSS/DR7). Except for the shift to restframe wavelengths, and the normalization to the g-band flux, no manipulation is applied to the original spectra. The algorithm guarantees that galaxies with similar spectra belong to the same class. We find that 99% of the galaxies can be assigned to only 17 major classes, with 11 additional minor classes including the remaining 1%. The classification is not unique since many galaxies appear in between classes, however, our rendering of the algorithm overcomes this weakness with a tool to identify borderline galaxies. Each class is characterized by a template spectrum, which is the average of all the spectra of the galaxies in the class. These low noise template spectra vary smoothly and continuously along a sequence labeled from 0 to 27, from the reddest class to the bluest class. Our Automatic Spectroscopic K-means-based (ASK) classification separates galaxies in colors, with classes characteristic of the red sequence, the blue cloud, as well as the green valley. When red sequence galaxies and green valley galaxies present emission lines, they are characteristic of AGN activity. Blue galaxy classes have emission lines corresponding to star formation regions. We find the expected correlation between spectroscopic class and Hubble type, but this relationship exhibits a high intrinsic scatter. Several potential uses of the ASK classification are identified and sketched, including fast determination of physical properties by interpolation, classes as templates in redshift determinations, and target selection in follow-up works (we find classes of Seyfert galaxies, green valley galaxies, as well as a significant number of outliers). The ASK classification is publicly accessible through various websites. Subject headings: catalogs – methods: statistical – galaxies: evolution – galaxies: fundamental parameters – galaxies: statistics

[1]  Alexander S. Szalay,et al.  Sloan digital sky survey: Early data release , 2002 .

[2]  C. Esteban,et al.  Cosmochemistry The Melting Pot of the Elements: Frontmatter , 2004 .

[3]  William Wilson Morgan,et al.  A SPECTRAL CLASSIFICATION OF GALAXIES , 1957 .

[4]  A. Pasquali,et al.  A Principal Component Analysis approach to the Star Formation History of elliptical galaxies in Compact Groups , 2005, astro-ph/0511753.

[5]  S. Maddox,et al.  zCOSMOS: A Large VLT/VIMOS Redshift Survey Covering 0 < z < 3 in the COSMOS Field , 2006, astro-ph/0612291.

[6]  Junxian Wang,et al.  Ensemble Learning for Independent Component Analysis of Normal Galaxy Spectra , 2006 .

[7]  R. Nichol,et al.  Quantifying the Bimodal Color-Magnitude Distribution of Galaxies , 2003, astro-ph/0309710.

[8]  T. Deeming,et al.  Stellar Spectral Classification: I. Application of Component Analysis , 1964 .

[9]  K. Abazajian,et al.  THE SEVENTH DATA RELEASE OF THE SLOAN DIGITAL SKY SURVEY , 2008, 0812.0649.

[10]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[11]  N. Brosch,et al.  Principal component analysis of International Ultraviolet Explorer galaxy spectra , 2004 .

[12]  A. J. Connolly,et al.  REDUCING THE DIMENSIONALITY OF DATA: LOCALLY LINEAR EMBEDDING OF SLOAN GALAXY SPECTRA , 2009, 0907.2238.

[13]  A. Heavens,et al.  A PUBLIC CATALOG OF STELLAR MASSES, STAR FORMATION AND METALLICITY HISTORIES, AND DUST CONTENT FROM THE SLOAN DIGITAL SKY SURVEY USING VESPA , 2009, 0904.1001.

[14]  R. Nichol,et al.  Distributions of Galaxy Spectral Types in the Sloan Digital Sky Survey , 2004, astro-ph/0407061.

[15]  A. Sandage The Classification of Galaxies: Early History and Ongoing Developments , 2005 .

[16]  Global regularities in integrated galaxy spectra , 1996, astro-ph/9612161.

[17]  The Host Galaxies of AGN , 2003, astro-ph/0304239.

[18]  Christian Reichardt,et al.  Recovering physical parameters from galaxy spectra using MOPED , 2001, astro-ph/0101074.

[19]  V. Narayanan,et al.  Color Separation of Galaxy Types in the Sloan Digital Sky Survey Imaging Data , 2001, astro-ph/0107201.

[20]  STAR FORMATION IN GALAXIES ALONG THE HUBBLE SEQUENCE , 1998, astro-ph/9807187.

[21]  J. Almeida,et al.  Physical Properties of the Solar Magnetic Photosphere under the MISMA Hypothesis. II. Network and Internetwork Fields at the Disk Center , 2000 .

[22]  O. Lahav,et al.  An artificial neural network approach to the classification of galaxy spectra , 1996, astro-ph/9608073.

[23]  G. Stasińska Cosmochemistry The Melting Pot of the Elements: Abundance Determinations In HII Regions And Planetary Nebulae , 2004 .

[24]  J. Dunlop,et al.  The star-formation history of the Universe from the stellar populations of nearby galaxies , 2004, Nature.

[25]  M. Bershady The Optical and Near-Infrared Colors of Galaxies. II. Spectral Classification , 1995 .

[26]  Puragra Guhathakurta,et al.  The DEEP2 Galaxy Redshift Survey: Spectral Classification of Galaxies at z ∼ 1 , 2003, astro-ph/0305587.

[27]  R. Kennicutt A spectrophotometric atlas of galaxies , 1992 .

[28]  L. Sodré,et al.  Semi‐empirical analysis of Sloan Digital Sky Survey galaxies – I. Spectral synthesis method , 2005 .

[29]  John E. Davis,et al.  Sloan Digital Sky Survey: Early Data Release , 2002 .

[30]  M. Snijders,et al.  Ultraviolet variability of NGC 4151: a study using principal component analysis , 1990 .

[31]  D. Ottaviani,et al.  Hγ and Hδ Absorption Features in Stars and Stellar Populations , 1997 .

[32]  O. Fèvre,et al.  A robust morphological classification of high-redshift galaxies using support vector machines on seeing limited images I. Method description , 2007, 0709.1359.

[33]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[34]  D. Madgwick,et al.  Correlating galaxy morphologies and spectra in the 2dF Galaxy Redshift Survey , 2002, astro-ph/0209051.

[35]  David Burstein,et al.  Old stellar populations. 5: Absorption feature indices for the complete LICK/IDS sample of stars , 1994 .

[36]  B. Chan,et al.  Archetypal analysis of galaxy spectra , 2003, astro-ph/0301491.

[37]  Benjamin D. Johnson,et al.  UV Star Formation Rates in the Local Universe , 2007, 0704.3611.

[38]  Ann I. Zabludoff,et al.  Spectral classification of galaxies along the hubble sequence , 1995 .

[39]  L. Sodré,et al.  Spectral classification of galaxies , 1994, astro-ph/9411080.

[40]  Brian Everitt,et al.  Cluster analysis , 1974 .

[41]  K. Schawinski,et al.  Observational evidence for AGN feedback in early-type galaxies , 2007, 0709.3015.

[42]  Timothy M. Heckman,et al.  The host galaxies of active galactic nuclei , 2003 .

[43]  K. Taylor,et al.  The 2dF Galaxy Redshift Survey: spectral types and luminosity functions , 1999, astro-ph/9903456.

[44]  Naoki Yasuda,et al.  A Catalog of Morphologically Classified Galaxies from the Sloan Digital Sky Survey: North Equatorial Region , 2007, 0704.1743.

[45]  Christopher J. Conselice The fundamental properties of galaxies and a new galaxy classification system , 2006 .

[46]  C. Lintott,et al.  DESTRUCTION OF MOLECULAR GAS RESERVOIRS IN EARLY-TYPE GALAXIES BY ACTIVE GALACTIC NUCLEUS FEEDBACK , 2008, 0809.1096.

[47]  S. Roweis,et al.  K-Corrections and Filter Transformations in the Ultraviolet, Optical, and Near-Infrared , 2006, astro-ph/0606170.

[48]  S. Kannappan,et al.  E/S0 GALAXIES ON THE BLUE COLOR–STELLAR MASS SEQUENCE AT z = 0: FADING MERGERS OR FUTURE SPIRALS? , 2009, 0903.3695.

[49]  E. Hubble,et al.  Realm of the Nebulae , 1936 .

[50]  A. Szalay,et al.  Spectral classification of galaxies: An Orthogonal approach , 1994, astro-ph/9411044.

[51]  Principal component analysis of IUE galaxy spectra , 2004, astro-ph/0402284.

[52]  M. Aaronson The morphological distribution of bright galaxies in the UVK color plane. , 1978 .

[53]  Naftali Tishby,et al.  Objective Classification of Galaxy Spectra using the Information Bottleneck Method , 2000, astro-ph/0005306.

[54]  Anna Jangren,et al.  STRUCTURAL AND PHOTOMETRIC CLASSIFICATION OF GALAXIES. I. CALIBRATION BASED ON A NEARBY GALAXY SAMPLE , 2000 .

[55]  Max Pettini,et al.  [O III] / [N II] as an abundance indicator at high redshift , 2004, astro-ph/0401128.

[56]  Tenerife,et al.  Search for Blue Compact Dwarf Galaxies During Quiescence. II. Metallicities of Gas and Stars, Ages, and Star Formation Rates , 2009 .

[57]  M. Humason Apparent Velocity-Shifts in the Spectra of Faint Nebulae , 1931 .

[58]  William H. Press,et al.  Spectral Classification and Luminosity Function of Galaxies in the Las Campanas Redshift Survey , 1997, astro-ph/9711227.

[59]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[60]  The Bimodal Galaxy Color Distribution: Dependence on Luminosity and Environment , 2004, astro-ph/0406266.

[61]  J. Baldwin,et al.  ERRATUM - CLASSIFICATION PARAMETERS FOR THE EMISSION-LINE SPECTRA OF EXTRAGALACTIC OBJECTS , 1981 .

[62]  Frederic H. Chaffee,et al.  An objective classification scheme for QSO spectra , 1992 .

[63]  V. Narayanan,et al.  Spectroscopic Target Selection for the Sloan Digital Sky Survey: The Luminous Red Galaxy Sample , 2001, astro-ph/0108153.