A model for environmental data extraction from multimedia and its evaluation against various chemical weather forecasting datasets

Abstract Environmental data analysis and information provision are considered of great importance for people, since environmental conditions are strongly related to health issues and directly affect a variety of everyday activities. Nowadays, there are several free web-based services that provide environmental information in several formats with map images being the most commonly used to present air quality and pollen forecasts. This format, despite being intuitive for humans, complicates the extraction and processing of the underlying data. Typical examples of this case are the chemical weather forecasts, which are usually encoded heatmaps (i.e. graphical representation of matrix data with colors), while the forecasted numerical pollutant concentrations are commonly unavailable. This work presents a model for the semi-automatic extraction of such information based on a template configuration tool, on methodologies for data reconstruction from images, as well as on text processing and Optical Character Recognition (OCR). The aforementioned modules are integrated in a standalone framework, which is extensively evaluated by comparing data extracted from a variety of chemical weather heat maps against the real numerical values produced by chemical weather forecasting models. The results demonstrate a satisfactory performance in terms of data recovery and positional accuracy.

[1]  Craig A. Knoblock,et al.  Identifying Maps on the World Wide Web , 2008, GIScience.

[2]  Chew Lim Tan,et al.  Text/Graphics Separation in Maps , 2001, GREC.

[3]  Jaakko Kukkonen,et al.  A review of operational, regional-scale, chemical weather forecasting models in Europe , 2012 .

[4]  Jaakko Kukkonen,et al.  A New Environmental Image Processing Method for Chemical Weather Forecasts in Europe , 2011, ITEE.

[5]  Yiannis Kompatsiaris,et al.  Discovery of Environmental Nodes in the Web , 2012, IRFC.

[6]  M.T. Musavi,et al.  Map processing methods: an automated alternative , 1988, [1988] Proceedings. The Twentieth Southeastern Symposium on System Theory.

[7]  Yiannis Kompatsiaris,et al.  Personalized Environmental Service Configuration and Delivery Orchestration: The PESCaDO Demonstrator , 2012, ESWC.

[8]  Mikhail Sofiev,et al.  COST ES0602: towards a European network on chemical weather forecasting and information systems , 2009 .

[9]  Thomas C. Henderson,et al.  Raster Map Image Analysis , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[10]  Craig A. Knoblock,et al.  Automatically identifying and georeferencing street maps on the web , 2005, GIR '05.

[11]  Motoi Iwata,et al.  Segmentation of Page Images Using the Area Voronoi Diagram , 1998, Comput. Vis. Image Underst..

[12]  Hung-Khoon Tan,et al.  Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and search , 2007, TRECVID.

[13]  Shih-Fu Chang,et al.  Columbia University TRECVID 2007 High-Level Feature Extraction , 2007, TRECVID.

[14]  Dong Wang,et al.  THU and ICRC at TRECVID 2007 , 2007, TRECVID.

[15]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[16]  Yiannis Kompatsiaris,et al.  Environmental data extraction from multimedia resources , 2012, MAED '12.

[17]  Jaakko Kukkonen,et al.  A European chemical weather forecasting portal , 2011 .

[18]  Jaakko Kukkonen,et al.  Evaluation of the Accuracy of an Inverse Image-Based Reconstruction Method for Chemical Weather Data , 2012 .

[19]  Jaakko Kukkonen,et al.  A European open access chemical weather forecasting portal , 2011 .

[20]  Yiannis Kompatsiaris,et al.  Extraction of Environmental Data from On-Line Environmental Information Sources , 2012, AIAI.

[21]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[22]  Frank Hönes,et al.  Layout extraction of mixed mode documents , 2005, Machine Vision and Applications.

[23]  Kostas Karatzas URBAN AIR QUALITY MANAGEMENT AND INFORMATION SYSTEMS IN EUROPE: LEGAL FRAMEWORK AND INFORMATION ACCESS , 2000 .

[24]  Syed Saqib Bukhari,et al.  Document image segmentation using discriminative learning over connected components , 2010, DAS '10.

[25]  Craig A. Knoblock,et al.  Classification of Line and Character Pixels on Raster Maps Using Discrete Cosine Transformation Coefficients and Support Vector Machine , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[26]  Kostas Karatzas INTERNET-BASED MANAGEMENT OF ENVIRONMENTAL SIMULATION TASKS , 2005 .