Machine learning in APOGEE: Identification of stellar populations through chemical abundances

The vast volume of data generated by modern astronomical surveys offers test beds for the application of machine-learning. It is important to evaluate potential existing tools and determine those that are optimal for extracting scientific knowledge from the available observations. We explore the possibility of using clustering algorithms to separate stellar populations with distinct chemical patterns. Star clusters are likely the most chemically homogeneous populations in the Galaxy, and therefore any practical approach to identifying distinct stellar populations should at least be able to separate clusters from each other. We applied eight clustering algorithms combined with four dimensionality reduction strategies to automatically distinguish stellar clusters using chemical abundances of 13 elements. Our sample includes 18 stellar clusters with a total of 453 stars. We use statistical tests showing that some pairs of clusters are indistinguishable from each other when chemical abundances from the Apache Point Galactic Evolution Experiment (APOGEE) are used. However, for most clusters we are able to automatically assign membership with metric scores similar to previous works. The confusion level of the automatically selected clusters is consistent with statistical tests that demonstrate the impossibility of perfectly distinguishing all the clusters from each other. These statistical tests and confusion levels establish a limit for the prospect of blindly identifying stars born in the same cluster based solely on chemical abundances. We find that some of the algorithms we explored are capable of blindly identify stellar populations with similar ages and chemical distributions in the APOGEE data. Because some stellar clusters are chemically indistinguishable, our study supports the notion of extending weak chemical tagging that involves families of clusters instead of individual clusters

[1]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[2]  J. Bertranpetit,et al.  Cosmic phylogeny: reconstructing the chemical history of the solar neighbourhood with an evolutionary tree , 2016, 1611.02575.

[3]  E. Rosolowsky,et al.  COMPARING SIMULATED EMISSION FROM MOLECULAR CLOUDS USING EXPERIMENTAL DESIGN , 2014, 1401.6251.

[4]  Tenerife,et al.  SYSTEMATIC SEARCH FOR EXTREMELY METAL-POOR GALAXIES IN THE SLOAN DIGITAL SKY SURVEY , 2011, 1109.0235.

[5]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[6]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[7]  Ted von Hippel,et al.  Automated classification of stellar spectra - II. Two-dimensional classification with neural networks and principal components analysis , 1998, astro-ph/9803050.

[8]  Sahar Shahaf,et al.  Detecting outliers and learning complex structures with large spectroscopic surveys - a case study with APOGEE stars , 2017, 1711.00022.

[9]  C. Prieto,et al.  SODIUM AND OXYGEN ABUNDANCES IN THE OPEN CLUSTER NGC 6791 FROM APOGEE H-BAND SPECTROSCOPY , 2014, 1411.2034.

[10]  M. Valentini,et al.  RAVE stars in K2: I. Improving RAVE red giants spectroscopy using asteroseismology from K2 Campaign 1 , 2016, 1609.03826.

[11]  C. Prieto,et al.  Cosmic variance in [O/Fe] in the Galactic disk , 2016, 1603.05491.

[12]  K. Freeman,et al.  The New Galaxy: Signatures of Its Formation , 2002, astro-ph/0208106.

[13]  C. Prieto,et al.  Chemical tagging with APOGEE: Discovery of a large population of N-rich stars in the inner Galaxy , 2016, 1606.05651.

[14]  P. Cargile,et al.  The Influence of Atomic Diffusion on Stellar Ages and Chemical Tagging , 2017, 1704.03465.

[15]  Rafael Garcia-Dias,et al.  Machine learning in APOGEE: Unsupervised spectral classification with K-means , 2018, ArXiv.

[16]  Tenerife,et al.  Automatic unsupervised classification of all SDSS/DR7 galaxy spectra , 2010, 1003.3186.

[17]  D. Forbes,et al.  Accreted versus in situ Milky Way globular clusters , 2010, 1001.4289.

[18]  E. Anderson,et al.  Two estimates of the distance to the Galactic Centre , 2013, 1309.2629.

[19]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[20]  A. Sarajedini,et al.  BVI Photometry and the Luminosity Functions of the Globular Cluster M92 , 2007, astro-ph/0703167.

[21]  Giampaolo Piotto,et al.  THE ACS SURVEY OF GALACTIC GLOBULAR CLUSTERS. VII. RELATIVE AGES , 2008, 0812.4541.

[22]  Radford M. Neal Bayesian Mixture Modeling , 1992 .

[23]  G. Carraro,et al.  Testing the chemical tagging technique with open clusters , 2015, 1503.02082.

[24]  F. Anders,et al.  Dissecting stellar chemical abundance space with t-SNE , 2018, Astronomy & Astrophysics.

[25]  Walter A. Siegmund,et al.  The 2.5 m Telescope of the Sloan Digital Sky Survey , 2006, astro-ph/0602326.

[26]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[27]  Nicholas Troup,et al.  ASPCAP: THE APOGEE STELLAR PARAMETER AND CHEMICAL ABUNDANCES PIPELINE , 2015, 1510.07635.

[28]  Pasi Fränti,et al.  Fast Agglomerative Clustering Using a k-Nearest Neighbor Graph , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  D. A. García-Hernández,et al.  Chemical Abundances of Main-sequence, Turnoff, Subgiant, and Red Giant Stars from APOGEE Spectra. I. Signatures of Diffusion in the Open Cluster M67 , 2018, 1803.04461.

[30]  J. Bovy,et al.  Blind chemical tagging with DBSCAN: prospects for spectroscopic surveys , 2019, Monthly Notices of the Royal Astronomical Society.

[31]  H. Rix,et al.  CHEMICAL TAGGING CAN WORK: IDENTIFICATION OF STELLAR PHASE-SPACE STRUCTURES PURELY BY CHEMICAL-ABUNDANCE SIMILARITY , 2016, 1601.05413.

[32]  Tenerife,et al.  Automated unsupervised classification of the Sloan Digital Sky Survey stellar spectra using k-means clustering , 2012, 1211.5321.

[33]  Germany,et al.  SEARCH FOR EXTREMELY METAL-POOR GALAXIES IN THE SLOAN DIGITAL SKY SURVEY. II. HIGH ELECTRON TEMPERATURE OBJECTS , 2016, 1601.01631.

[34]  OB Stars in the Solar Neighborhood. II. Kinematics , 2006, astro-ph/0605408.

[35]  D. Zucker,et al.  Quantifying chemical tagging: towards robust group finding in the Galaxy , 2012, 1210.3407.

[36]  Andrew J. Connolly,et al.  CLASSIFICATION OF STELLAR SPECTRA WITH LOCAL LINEAR EMBEDDING , 2011 .

[37]  Annie C. Robin,et al.  ABUNDANCES, STELLAR PARAMETERS, AND SPECTRA FROM THE SDSS-III/APOGEE SURVEY , 2015, 1501.04110.

[38]  B. Yanny,et al.  A Spectroscopic Study of the Ancient Milky Way: F- and G-Type Stars in the Third Data Release of the Sloan Digital Sky Survey , 2005, astro-ph/0509812.

[39]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[40]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[41]  B. Gibson,et al.  Very metal-poor stars observed by the RAVE survey , 2017, Astronomy & Astrophysics.

[42]  T. Beers,et al.  THE OPEN CLUSTER CHEMICAL ANALYSIS AND MAPPING SURVEY: LOCAL GALACTIC METALLICITY GRADIENT WITH APOGEE USING SDSS DR10 , 2013, 1308.4195.

[43]  J. Bovy,et al.  The dimensionality of stellar chemical space using spectra from the Apache Point Observatory Galactic Evolution Experiment , 2017, 1706.00009.

[44]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[45]  C. Prieto,et al.  APOGEE Data Releases 13 and 14: Data and Analysis , 2018, The Astronomical Journal.

[46]  Harinder P. Singh,et al.  Stellar spectral classification using principal component analysis and artificial neural networks , 1998 .

[47]  D. A. García-Hernández,et al.  Atypical Mg-poor Milky Way Field Stars with Globular Cluster Second-generation-like Chemical Patterns , 2017, 1707.03108.

[48]  Alejandra Rodríguez,et al.  Automated knowledge-based analysis and classification of stellar spectra using fuzzy reasoning , 2004, Expert Syst. Appl..

[49]  H. Rix,et al.  Galactic Doppelgängers: The Chemical Similarity Among Field Stars and Among Stars with a Common Birth Origin , 2017, 1701.07829.

[50]  A. J. Connolly,et al.  REDUCING THE DIMENSIONALITY OF DATA: LOCALLY LINEAR EMBEDDING OF SLOAN GALAXY SPECTRA , 2009, 0907.2238.

[51]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[52]  G. F. Porto de Mello,et al.  Accurate and homogeneous abundance patterns in solar-type stars of the solar neighbourhood: a chemo-chronological analysis , 2012, 1204.4433.

[53]  Antonio Mampaso,et al.  Automatic spectral classification of stellar spectra with low signal-to-noise ratio using artificial neural networks , 2012 .

[54]  C. Prieto,et al.  CHEMICAL ABUNDANCES IN A SAMPLE OF RED GIANTS IN THE OPEN CLUSTER NGC 2420 FROM APOGEE , 2016, 1607.06102.

[55]  D. A. García-Hernández,et al.  University of Birmingham The Fourteenth Data Release of the Sloan Digital Sky Survey: , 2017 .

[56]  C. Prieto,et al.  Timing the Evolution of the Galactic Disk with NGC 6791: An Open Cluster with Peculiar High-α Chemistry as Seen by APOGEE , 2017, 1704.07305.

[57]  Alejandra Rodríguez,et al.  STARMIND: A FUZZY LOGIC KNOWLEDGE-BASED SYSTEM FOR THE AUTOMATED CLASSIFICATION OF STARS IN THE MK SYSTEM , 2009 .

[58]  U. Munari,et al.  The Galah Survey: Classification and Diagnostics with t-SNE Reduction of Spectral Information , 2016, 1612.02242.

[59]  Yuan-Sen Ting,et al.  Principal component analysis on chemical abundances spaces , 2011, 1112.3207.

[60]  S. Martell,et al.  The GALAH survey: chemical tagging of star clusters and new members in the Pleiades , 2017, 1709.00794.

[61]  C. Prieto,et al.  APOGEE Data Releases 13 and 14: Stellar Parameter and Abundance Comparisons with Independent Analyses , 2018, The Astronomical Journal.

[62]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[63]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[64]  E. Grebel,et al.  Chemodynamical Clustering Applied to APOGEE Data: Rediscovering Globular Clusters , 2017, The Astrophysical Journal.

[65]  Robert Barkhouser,et al.  The Apache Point Observatory Galactic Evolution Experiment (APOGEE) , 2007 .

[66]  D. Darling The Kolmogorov-Smirnov, Cramer-von Mises Tests , 1957 .

[67]  H. Rocha-Pinto,et al.  Clustering in the stellar abundance space , 2017, 1710.08427.

[68]  Casiana Muñoz-Tuñón,et al.  AUTOMATIC UNSUPERVISED CLASSIFICATION OF ALL SLOAN DIGITAL SKY SURVEY DATA RELEASE 7 GALAXY SPECTRA , 2010 .

[69]  Tenerife,et al.  Search for Blue Compact Dwarf Galaxies During Quiescence. II. Metallicities of Gas and Stars, Ages, and Star Formation Rates , 2009 .

[70]  A. Moitinho,et al.  New catalogue of optically visible open clusters and candidates , 2002, astro-ph/0203351.

[71]  Jo Bovy,et al.  THE CHEMICAL HOMOGENEITY OF OPEN CLUSTERS , 2015, 1510.06745.