A Common-Factor Approach for Multivariate Data Cleaning with an Application to Mars Phoenix Mission Data

Data quality is fundamentally important to ensure the reliability of data for stakeholders to make decisions. In real world applications, such as scientific exploration of extreme environments, it is unrealistic to require raw data collected to be perfect. As data miners, when it is infeasible to physically know the why and the how in order to clean up the data, we propose to seek the intrinsic structure of the signal to identify the common factors of multivariate data. Using our new data driven learning method, the common-factor data cleaning approach, we address an interdisciplinary challenge on multivariate data cleaning when complex external impacts appear to interfere with multiple data measurements. Existing data analyses typically process one signal measurement at a time without considering the associations among all signals. We analyze all signal measurements simultaneously to find the hidden common factors that drive all measurements to vary together, but not as a result of the true data measurements. We use common factors to reduce the variations in the data without changing the base mean level of the data to avoid altering the physical meaning.

[1]  Robert Clarke,et al.  Differential dependency network analysis to identify condition-specific topological changes in biological networks , 2009, Bioinform..

[2]  George E. P. Box,et al.  Identifying a Simplifying Structure in Time Series , 1987 .

[3]  John Michael Morookian,et al.  The MECA Wet Chemistry Laboratory on the 2007 Phoenix Mars Scout Lander , 2009 .

[4]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[5]  Jean-Baptiste Poline,et al.  Brain covariance selection: better individual functional connectivity models using population prior , 2010, NIPS.

[6]  David C. Catling,et al.  Soluble salts at the Phoenix Lander site, Mars: A reanalysis of the Wet Chemistry Laboratory data , 2014 .

[7]  James S. Walker,et al.  A Primer on Wavelets and Their Scientific Applications , 1999 .

[8]  M. West On scale mixtures of normal distributions , 1987 .

[9]  William V. Boynton,et al.  Wet Chemistry experiments on the 2007 Phoenix Mars Scout Lander mission: Data analysis and results , 2010 .

[10]  R. Baillie,et al.  Common Stochastic Trends in a System of Exchange Rates , 1989 .

[11]  D. Ming,et al.  Detection of Perchlorate and the Soluble Chemistry of Martian Soil at the Phoenix Lander Site , 2009, Science.

[12]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[13]  E. R. Davies,et al.  Machine vision - theory, algorithms, practicalities , 2004 .

[14]  T. W. Anderson An Introduction to Multivariate Statistical Analysis, 2nd Edition. , 1985 .