Drifting Features: Detection and evaluation in the context of automatic RRLs identification in VVV

Context. As most of the modern astronomical sky surveys produce data faster than humans can analyze it, Machine Learning (ML) has become a central tool in Astronomy. Modern ML methods can be characterized as highly resistant to some experimental errors. However, small changes on the data over long angular distances or long periods of time, which cannot be easily detected by statistical methods, can be harmful to these methods. Aims. We develop a new strategy to cope with this problem, also using ML methods in an innovative way, to identify these potentially harmful features. Methods. We introduce and discuss the notion of Drifting Features, related with small changes in the properties as measured in the data features. We use the identification of RRLs in VVV based on an earlier work and introduce a method for detecting Drifting Features. For the VVV, each sky observation zone is called a tile. Our method forces the classifier to learn from the sources (mostly stellar ’point sources’) the tile they are originated, and select the features more relevant to the task of finding candidates to Drifting Features. Results. We show that this method can efficiently identify a reduced set of features that contains useful information about the tile of origin of the sources. For our particular example of detecting RRLs in VVV, we find that Drifting Features are mostly related to color indices. On the other hand, we show that, even if we have a clear set of Drifting Features in our problem, they are mostly insensitive to the identification of RRLs. Conclusions. Drifting Features can be efficiently identified using ML methods. However, in our example, removing Drifting Features does not improve the identification of RRLs.

[1]  C. Hui,et al.  An investigation on the factors affecting machine learning classifications in gamma-ray astronomy , 2020, Monthly Notices of the Royal Astronomical Society.

[2]  M. Schultheis,et al.  Reddening and metallicity maps of the Milky Way bulge from VVV and 2MASS II. The complete high resolution extinction map and implications for Galactic bulge studies , 2012, 1204.4004.

[3]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[4]  Canada.,et al.  Data Mining and Machine Learning in Astronomy , 2009, 0906.2173.

[5]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[6]  R. de Grijs,et al.  VISTA Variables in the Via Lactea (VVV): The public ESO near-IR variability survey of the Milky Way , 2009, 0912.1056.

[7]  Pablo Duboue,et al.  The Art of Feature Engineering , 2020 .

[8]  Benjamin Stappers,et al.  Imbalance Learning for Variable Star Classification , 2020, Monthly Notices of the Royal Astronomical Society.

[9]  Richard Bennett,et al.  The Visible and Infrared Survey Telescope for Astronomy (VISTA): Design, technical overview, and performance , 2014, 1409.4780.

[10]  Pablo M. Granitto,et al.  From FATS to feets: Further improvements to an astronomical feature extraction tool based on machine learning , 2018, Astron. Comput..

[11]  P. Granitto,et al.  Automatic Catalog of RRLyrae from ~ 14 million VVV Light Curves: How far can we go with traditional machine-learning? , 2020, Astronomy & Astrophysics.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[14]  G. Jogesh Babu,et al.  Big data in astronomy , 2012 .