Detecting Variability in Massive Astronomical Time-Series Data I: application of an infinite Gaussian mixture model

We present a new framework to detect various types of variable objects within massive astronomical time-series data. Assuming that the dominant population of objects is non-variable, we find outliers from this population by using a non-parametric Bayesian clustering algorithm based on an infinite GaussianMixtureModel (GMM) and the Dirichlet Process. The algorithm extracts information from a given dataset, which is described by six variability indices. The GMM uses those variability indices to recover clusters that are described by six-dimensional multivariate Gaussian distributions, allowing our approach to consider the sampling pattern of time-series data, systematic biases, the number of data points for each light curve, and photometric quality. Using the Northern Sky Variability Survey data, we test our approach and prove that the infinite GMM is useful at detecting variable objects, while providing statistical inference estimation that suppresses false detection. The proposed approach will be effective in the exploration of future surveys such as GAIA, Pan-Starrs, and LSST, which will produce massive time-series data.

[1]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[2]  C. Beichman,et al.  Infrared Astronomical Satellite (IRAS) catalogs and atlases , 1988 .

[3]  Constantine Kotropoulos,et al.  Gaussian Mixture Modeling by Exploiting the Mahalanobis Distance , 2008, IEEE Transactions on Signal Processing.

[4]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[5]  Peter Kopacek,et al.  Advances in Robotics , 2005, EUROCAST.

[6]  Min-Su Shin,et al.  EFFICIENT PERIOD SEARCH FOR TIME SERIES PHOTOMETRY , 2004 .

[7]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[8]  Brandon C. Kelly,et al.  Morphological Classification of Galaxies by Shapelet Decomposition in the Sloan Digital Sky Survey , 2004 .

[9]  N. Wyn Evans,et al.  Light-curve classification in massive variability surveys — I. Microlensing , 2002, astro-ph/0211121.

[10]  V. Belokurov,et al.  Light-curve classification in massive variability surveys - II. Transients towards the Large Magellanic Cloud , 2004, astro-ph/0404232.

[11]  Peter B. Stetson,et al.  ON THE AUTOMATIC DETERMINATION OF LIGHT-CURVE PARAMETERS FOR CEPHEID VARIABLES , 1996 .

[12]  S. G. Djorgovski,et al.  Automated probabilistic classification of transients and variables , 2008, 0802.3199.

[13]  Yann Le Du,et al.  Lightcurve Classification in Massive Variability Surveys , 2003 .

[14]  P. J. Quinn,et al.  The MACHO project LMC variable star inventory. 1: Beat Cepheids-conclusive evidence for the excitation of the second overtone in classical Cepheids , 1994 .

[15]  A. Schwarzenberg-Czerny Fast and Statistically Optimal Period Search in Uneven Sampled Observations , 1996 .

[16]  Christopher R. Genovese,et al.  Revealing components of the galaxy population through non-parametric techniques , 2008, 0809.2800.

[17]  C. Aerts,et al.  Astrophysics of Variable Stars , 2006 .

[18]  J. D. Williams Moments of the Ratio of the Mean Square Successive Difference to the Mean Square Difference in Samples From a Normal Universe , 1941 .

[19]  K. E. McGowan,et al.  Northern Sky Variability Survey: Public Data Release , 2004, astro-ph/0401217.

[20]  E. Martin,et al.  Probability density estimation via an infinite Gaussian mixture model: application to statistical process monitoring , 2006 .

[21]  M. Templeton,et al.  Secular Evolution in Mira Variable Pulsations , 2005, astro-ph/0504527.

[22]  Michael J. Panik Advanced Statistics from an Elementary Point of View , 2005 .

[23]  Hee-Won Lee,et al.  The Seventh Pacific Rim Conference on Stellar Astrophysics , 2007 .

[24]  D. Kurtz The Impact of Large‐Scale Surveys on Pulsating Star Research: IAU Colloquium 176 , 2000 .

[25]  C. Koen The Nyquist frequency for irregularly spaced time-series: a calculation formula , 2006 .

[26]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[27]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[28]  Robert A. Shaw,et al.  Astronomical data analysis software and systems IV : meeting held at Baltimore, Maryland, 25-28 September 1994 , 1995 .

[29]  L. M. Sarro,et al.  Automated supervised classification of variable stars - I. Methodology , 2007, 0711.0703.

[30]  L. Eyer,et al.  A study of supervised classification of Hipparcos variable stars using PCA and Support Vector Machines , 2007, 0712.2898.

[31]  Bohdan Paczynski Massive Variability Searches: The Past, Present and Future Massive Variability Searches , 2001 .

[32]  Laurent Eyer,et al.  Variable stars across the observational HR diagram , 2007, 0712.3797.

[33]  Yee Whye Teh,et al.  Dirichlet Processes: Tutorial and Practical Course , 2007 .

[34]  J. Neumann Distribution of the Ratio of the Mean Square Successive Difference to the Variance , 1941 .

[35]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[36]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[37]  R. Maitra,et al.  Supplement to “ A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere ” published in the Journal of Computational and Graphical Statistics , 2009 .

[38]  Variability-selected QSO candidates in OGLE-II Galactic bulge fields , 2004, astro-ph/0407302.

[39]  Asis Kumar Chattopadhyay,et al.  Statistical Evidence for Three Classes of Gamma-Ray Bursts , 2007, 0705.4020.

[40]  Nick Kaiser,et al.  Pan-STARRS: a wide-field optical survey telescope array , 2004, SPIE Astronomical Telescopes + Instrumentation.

[41]  B. Peterson,et al.  Application of cubic splines to the spectral analysis of unequally spaced data , 1994 .

[42]  Christopher W. Stubbs,et al.  The macho project LMC variable star inventory. V. Classification and orbits of 611 eclipsing binary stars , 1997 .

[43]  Bohdan Paczynski Monitoring All Sky for Variability , 2000 .

[44]  Wojtek J. Krzanowski,et al.  Principles of multivariate analysis : a user's perspective. oxford , 1988 .

[45]  Purushottam W. Laud,et al.  Bayesian Nonparametric Inference for Random Distributions and Related Functions , 1999 .

[46]  H. Aumann,et al.  Infrared Astronomical Satellite , 1977 .

[47]  L. S. Nelson,et al.  An Approximation for the Distribution of the von Neumann Ratio , 1981 .

[48]  Christopher W. Stubbs,et al.  The MACHO Project LMC Variable Star Inventory.II.LMC RR Lyrae Stars- Pulsational Characteristics and Indications of a Global Youth of the LMC , 1996 .

[49]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[50]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .