PERCEIVE: Precipitation Data Characterization by means on Frequent Spatio-Temporal Sequences

Nowadays large amounts of climatology data, including daily precipitation data, are collected by means of sensors located in different locations of the world. The data driven analysis of these large data sets by means of scalable machine learning and data mining techniques allows extracting interesting knowledge from data, inferring interesting patterns and correlations among sets of spatio-temporal events and characterizing them. In this paper, we describe the PERCEIVE framework. PERCEIVE is a data-driven framework based on frequent spatio-temporal sequences and aims at extracting frequent correlations among spatio-temporal precipitation events. It is implemented by using R and Apache Spark, for scalability reasons, and provides also a visualization module that can be used to intuitively show the extracted patterns. A preliminary set of experiments show the efficiency and the effectiveness of PERCEIVE.

[1]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[2]  Peter A. Flach,et al.  Machine Learning - The Art and Science of Algorithms that Make Sense of Data , 2012 .

[3]  Giulia Bruno,et al.  TOD: Temporal outlier detection by using quasi-functional temporal dependencies , 2010, Data Knowl. Eng..

[4]  Michelangelo Ceci,et al.  Discovery of spatial association rules in geo-referenced census data: A relational mining approach , 2003, Intell. Data Anal..

[5]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[6]  R. Vose,et al.  An Overview of the Global Historical Climatology Network-Daily Database , 2012 .

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Silvia Chiusano,et al.  Analysis of Medical Treatments Using Data Mining Techniques , 2014, IEEE Intell. Informatics Bull..

[11]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[12]  Jiaqiu Wang,et al.  Integrated Spatio‐temporal Data Mining for Forest Fire Prediction , 2008, Trans. GIS.

[13]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[14]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[15]  Mohammad Al Hasan,et al.  Biological knowledge discovery and data mining , 2012, Sci. Program..

[16]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .