Geographical Imputation of Missing Poaceae Pollen Data via Convolutional Neural Networks

Airborne pollen monitoring datasets sometimes exhibit gaps, even very long, either because of maintenance or because of a lack of expert personnel. Despite the numerous imputation techniques available, not all of them effectively include the spatial relations of the data since the assumption of missing-at-random is made. However, there are several techniques in geostatistics that overcome this limitation such as the inverse distance weighting and Gaussian processes or kriging. In this paper, a new method is proposed that utilizes convolutional neural networks. This method not only shows a competitive advantage in terms of accuracy when compared to the aforementioned techniques by improving the error by 5% on average, but also reduces execution training times by 90% when compared to a Gaussian process. To show the advantages of the proposal, 10%, 20%, and 30% of the data points are removed in the time series of a Poaceae pollen observation station in the region of Madrid, and the airborne concentrations from the remaining available stations in the network are used to impute the data removed. Even though the improvements in terms of accuracy are not significantly large, even if consistent, the gain in computational time and the flexibility of the proposed convolutional neural network allow field experts to adapt and extend the solution, for instance including meteorological variables, with the potential decrease of the errors reported in this paper.

[1]  C. Galán,et al.  May the definition of pollen season influence aerobiological results? , 2006 .

[2]  Víctor Sevillano,et al.  Improving classification of pollen grain images of the POLEN23E dataset through three different applications of deep learning convolutional neural networks , 2018, PloS one.

[3]  M. Sofiev,et al.  Building an automatic pollen monitoring network (ePIN): Selection of optimal sites by clustering pollen stations. , 2019, The Science of the total environment.

[4]  J. Culig,et al.  Poaceae pollen in the atmosphere of Zagreb (Croatia), 2002 – 2005 , 2006 .

[5]  G. Matheron Principles of geostatistics , 1963 .

[6]  D. Bennett How can I deal with missing data in my study? , 2001, Australian and New Zealand journal of public health.

[7]  Ricardo Navares,et al.  Predicting the Poaceae pollen season: six month-ahead forecasting and identification of relevant features , 2017, International Journal of Biometeorology.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  R. Vautard,et al.  Erratum: “Climate Change and Future Pollen Allergy in Europe” , 2018, Environmental health perspectives.

[10]  J. Mejuto,et al.  A model to forecast the risk periods of Plantago pollen allergy by using the ANN methodology , 2015, Aerobiologia.

[11]  Gebreab K Zewdie,et al.  Applying Deep Neural Networks and Ensemble Machine Learning Methods to Forecast Airborne Ambrosia Pollen , 2019, International journal of environmental research and public health.

[12]  Competitive Advantages of Computational Intelligence , 2010 .

[13]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[14]  Jörg Schaber,et al.  Physiology-based phenology models for forest tree species in Germany , 2003, International journal of biometeorology.

[15]  S. Fernández-Rodríguez,et al.  Temporal modelling and forecasting of the airborne pollen of Cupressaceae on the southwestern Iberian Peninsula , 2016, International Journal of Biometeorology.

[16]  K. Bergmann,et al.  Impact of pollen. , 2013 .

[17]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[18]  Ricardo Navares,et al.  Forecasting Plantago pollen: improving feature selection through random forests, clustering, and Friedman tests , 2019, Theoretical and Applied Climatology.

[19]  D. Shepard A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.

[20]  M. Castellano-Méndez,et al.  Artificial neural networks as a useful tool to predict the risk level of Betula pollen in the air , 2005, International journal of biometeorology.

[21]  J. Nowosad Spatiotemporal models for predicting high pollen concentration level of Corylus, Alnus, and Betula , 2015, International Journal of Biometeorology.

[22]  Matt Smith,et al.  A 30-day-ahead forecast model for grass pollen in north London, United Kingdom , 2006, International journal of biometeorology.

[23]  M. Puc Artificial neural network model of the relationship between Betula pollen and meteorological factors in Szczecin (Poland) , 2011, International Journal of Biometeorology.

[24]  P. Cuesta,et al.  Models for forecasting airborne Cupressaceae pollen levels in central Spain , 2012, International Journal of Biometeorology.

[25]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[26]  F. Rodríguez-Rajo,et al.  Effect of air temperature on forecasting the start of the Betula pollen season at two contrasting sites in the south of Europe (1995–2001) , 2003, International journal of biometeorology.

[27]  Ricardo Navares,et al.  What are the most important variables for Poaceae airborne pollen forecasting? , 2017, The Science of the total environment.