Feasibility Study of Applying Descriptive ILP to Large Geographic Databases

This paper discusses a case study in which the aim is to discover regularities and anomalies in large databases containing geographic data, to improve and maintain the overall data quality. The application of Inductive Logic Programming (ILP) and descriptive ILP in particular to this case is discussed and motivated. In an experiment on real-world data, a classical descriptive ILP algorithm (WARMR) is applied to the hamlet of Beggen, Luxemburg, to mine for rules describing regularities. The algorithm adopts the setting of learning from interpretations. In a next stage, the violating interpretations of the rules could be traced to identify candidate anomalies. A rule export module was set up to feed the results to a rule checking engine for further validation of this experiment. Finally, the results are discussed, the feasibilities of the system used in the case study are assessed and possibilities w.r.t. a larger scale application of the experiment are discussed.

[1]  Saso Dzeroski,et al.  Multi-relational data mining: an introduction , 2003, SKDD.

[2]  Anne Laurent,et al.  Mining unexpected multidimensional rules , 2007, DOLAP '07.

[3]  L. D. Raedt,et al.  Three companions for data mining in first order logic , 2001 .

[4]  Donato Malerba,et al.  Empowering a GIS with inductive learning capabilities: the case of INGENS , 2003, Comput. Environ. Urban Syst..

[5]  Shashi Shekhar,et al.  Book chapter in data mining: Next generation chal-lenges and future directions , 2003 .

[6]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[7]  Michelangelo Ceci,et al.  Spatial associative classification: propositional vs structural approach , 2006, Journal of Intelligent Information Systems.

[8]  Donato Malerba,et al.  Inducing Multi-Level Association Rules from Multiple Relations , 2004, Machine Learning.

[9]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[10]  Luc De Raedt,et al.  Scaling Up Inductive Logic Programming by Learning from Interpretations , 1999, Data Mining and Knowledge Discovery.

[11]  Ranga Raju Vatsavai,et al.  Trends in Spatial Data Mining , 2022 .

[12]  Joris Maervoet Rule Induction for Geographical Databases (M.Sc. thesis), Vrije Universiteit Brussel, Belgium , 2007 .

[13]  Michelangelo Ceci,et al.  Discovery of spatial association rules in geo-referenced census data: A relational mining approach , 2003, Intell. Data Anal..

[14]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[15]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[16]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[17]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[18]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[19]  Donato Malerba,et al.  Mining spatial association rules in census data , 2002 .

[20]  Luc De Raedt,et al.  CLASSIC'CL: An Integrated ILP System , 2005, Discovery Science.

[21]  Luc Dehaspe Frequent Pattern Discovery in First-Order Logic , 1999, AI Commun..