A statistical analysis of noisy crowdsourced weather data

Spatial prediction of weather-elements like temperature, precipitation, and barometric pressure are generally based on satellite imagery or data collected at ground-stations. None of these data provide information at a more granular or "hyper-local" resolution. On the other hand, crowdsourced weather data, which are captured by sensors installed on mobile devices and gathered by weather-related mobile apps like WeatherSignal and AccuWeather, can serve as potential data sources for analyzing environmental processes at a hyper-local resolution. However, due to the low quality of the sensors and the non-laboratory environment, the quality of the observations in crowdsourced data is compromised. This paper describes methods to improve hyper-local spatial prediction using this varying-quality noisy crowdsourced information. We introduce a reliability metric, namely Veracity Score (VS), to assess the quality of the crowdsourced observations using a coarser, but high-quality, reference data. A VS-based methodology to analyze noisy spatial data is proposed and evaluated through extensive simulations. The merits of the proposed approach are illustrated through case studies analyzing crowdsourced daily average ambient temperature readings for one day in the contiguous United States.

[1]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[2]  S. Lele,et al.  Integrating AVHRR satellite data and NOAA ground observations to predict surface air temperature: a statistical approach , 2004 .

[3]  N. Cressie,et al.  Robust estimation of the variogram: I , 1980 .

[4]  Werner A. Stahel,et al.  Sharpening Wald-type inference in robust regression for small samples , 2011, Comput. Stat. Data Anal..

[5]  Francesco Uboldi,et al.  A spatial consistency test for surface observations from mesoscale meteorological networks , 2010 .

[6]  J. Ghosh A New Proof of the Bahadur Representation of Quantiles and an Application , 1971 .

[7]  Sagi Dalyot,et al.  Crowdsourcing User-Generated Mobile Sensor Weather Data for Densifying Static Geosensor Networks , 2017, ISPRS Int. J. Geo Inf..

[8]  R. M. Lark,et al.  A comparison of some robust estimators of the variogram for use in soil survey , 2000 .

[9]  Peter Hall,et al.  Properties of nonparametric estimators of autocovariance for stationary random fields , 1994 .

[10]  C. Frei Interpolation of temperature in a mountainous region using nonlinear profiles and non‐Euclidean distances , 2014 .

[11]  Alan David Hutson,et al.  Resampling Methods for Dependent Data , 2004, Technometrics.

[12]  C.,et al.  Analysis methods for numerical weather prediction , 2022 .

[13]  S. Lahiri,et al.  On Statistical Properties of A Veracity Scoring Method for Spatial Data. , 2019, 1906.08843.

[14]  Martin Charlton,et al.  Multivariate Spatial Outlier Detection Using Robust Geographically Weighted Methods , 2013, Mathematical Geosciences.

[15]  J. Møller,et al.  Handbook of Spatial Statistics , 2008 .

[16]  Peter E. Thornton,et al.  Generating surfaces of daily meteorological variables over large regions of complex terrain , 1997 .

[17]  Montserrat Fuentes,et al.  A high frequency kriging approach for non‐stationary environmental processes , 2001 .

[18]  Robust estimation of the external drift and the variogram of spatial data , 2013 .

[19]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[20]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[21]  Christopher J Paciorek,et al.  Spatial modelling using a new class of nonstationary covariance functions , 2006, Environmetrics.

[22]  P. Ceccato,et al.  Evaluation of MODIS land surface temperature data to estimate air temperature in different ecosystems over Africa , 2010 .

[23]  Marc G. Genton,et al.  Highly Robust Variogram Estimation , 1998 .

[24]  John B. Willett,et al.  Another Cautionary Note about R 2: Its Use in Weighted Least-Squares Regression Analysis , 1988 .

[25]  K. Haskard,et al.  An anisotropic Matern spatial covariance model: REML estimation and properties. , 2007 .

[26]  Peter Filzmoser,et al.  An Object-Oriented Framework for Robust Multivariate Analysis , 2009 .

[27]  L. Gandin,et al.  Complex Quality Control of Meteorological Observations , 1988 .

[28]  Noel A Cressie,et al.  On asymptotic distribution and asymptotic efficiency of least squares estimators of spatial variogram parameters , 2002 .

[29]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[30]  Pranab Kumar Sen,et al.  Asymptotic Normality of Sample Quantiles for $m$-Dependent Processes , 1968 .

[31]  T. Gneiting Strictly and non-strictly positive definite functions on spheres , 2011, 1111.7077.

[32]  Elisa Bertino,et al.  Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.