A distributed learning algorithm for Self-Organizing Maps intended for outlier analysis in the GAIA - ESA mission

Since its launch in December 2013, the Gaia space mission has collected and continues to collect tremendous amounts of information concerning the objects that populate our Galaxy and beyond. The international Gaia Data and Analysis Consortium (DPAC) is in charge of developing computer algorithms that extract and process astrophysical information from these objects. It organizes its work by means of work packages; one of these packages, Outlier Analysis, is dedicated to the exploration of vast amounts of outlier objects detected during the main classification of the observations. We present a method that is based on Self-Organizing Maps (SOM) and parallelized by means of the Hadoop framework so as to improve its performance. We also compare the execution times of both the sequential and the distributed versions of the algorithm.

[1]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[2]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[3]  Marie Cottrell,et al.  Advantages and drawbacks of the Batch Kohonen algorithm , 2002, ESANN.

[4]  Xiaohui Liu,et al.  Identifying the measurement noise in glaucomatous testing: an artificial neural network approach , 1994, Artif. Intell. Medicine.

[5]  Xiaohui Liu,et al.  Analyzing Outliers Cautiously , 2002, IEEE Trans. Knowl. Data Eng..

[6]  James E. Geach,et al.  Unsupervised self-organized mapping: a versatile empirical tool for object selection, classification and redshift estimation in large surveys , 2011, 1110.0005.

[7]  C. Fabricius,et al.  Gaia broad band photometry , 2010, 1008.0815.

[8]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[9]  J. H. J. de Bruijne,et al.  Science performance of Gaia, ESA’s space-astrometry mission , 2012, 1201.3238.

[10]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[11]  Palma Blonda,et al.  A survey of fuzzy clustering algorithms for pattern recognition. I , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[12]  M. J. Way,et al.  Can Self-Organizing Maps Accurately Predict Photometric Redshifts? , 2012 .

[13]  Carlos Dafonte,et al.  Object classification and outliers analysis in the forthcoming Gaia mission , 2010 .

[14]  Carlos Dafonte,et al.  SOM ensemble for unsupervised outlier analysis. Application to outlier identification in the Gaia astronomical survey , 2013, Expert Syst. Appl..

[15]  R. Sordo,et al.  An approach to the analysis of SDSS spectroscopic outliers based on Self-Organizing Maps , 2013, 1309.2418.

[16]  A. Naim,et al.  Galaxy Morphology without Classification: Self-organizing Maps , 1997 .

[17]  Annie C. Robin,et al.  GUMS & GOG: Simulating the Universe for Gaia , 2010 .

[18]  M. V. Velzen,et al.  Self-organizing maps , 2007 .