Outlier detection in relational data: A case study in geographical information systems

Geographical information systems are commonly used for a variety of purposes. Many of them make use of a large database of geographical data, the correctness of which strongly influences the reliability of the system. In this paper, we present an approach to quality maintenance that is based on automatic discovery of non-perfect regularities in the data. The underlying idea is that exceptions to these regularities ('outliers') are considered probable errors in the data, to be investigated by a human expert. A case study shows how the tool can be used for extracting valuable knowledge about outliers in real-world geographical data, in an adaptive manner to the evolving data model supporting it. While the tool aims specifically at geographical information systems, the underlying approach is more broadly applicable for quality maintenance in data-rich intelligent systems.

[1]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[2]  Luigi Palopoli,et al.  Outlier detection by logic programming , 2004, TOCL.

[3]  Shirley Ann Becker Data Warehousing and Web Engineering , 2002 .

[4]  Luc Dehaspe Frequent Pattern Discovery in First-Order Logic , 1999, AI Commun..

[5]  Karine Zeitouni A survey of spatial data mining methods databases and statistics point of views , 2000, IRMA Conference.

[6]  Martin Ester,et al.  A multi-relational approach to spatial classification , 2009, KDD.

[7]  Hans-Peter Kriegel,et al.  Algorithms for Characterization and Trend Detection in Spatial Databases , 1998, KDD.

[8]  Donato Malerba,et al.  Inducing Multi-Level Association Rules from Multiple Relations , 2004, Machine Learning.

[9]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[10]  Costantina Caruso A Data Mining Methodology for Anomaly Detection in Network Data: Choosing System-Defined Decision Boundaries , 2007, SEBD.

[11]  Kristian Rietveld Distributed Approaches for Discovering Unique Factors in the Human Genome , 2009 .

[12]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[13]  Yun Sing Koh,et al.  Mining interesting imperfectly sporadic rules , 2006, Knowledge and Information Systems.

[14]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[15]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[16]  Patrick De Causmaecker,et al.  Feasibility Study of Applying Descriptive ILP to Large Geographic Databases , 2008 .

[17]  Jan Ramon Thesis: clustering and instance based learning in first order logic , 2002 .

[18]  Einoshin Suzuki,et al.  Undirected Discovery of Interesting Exception Rules , 2002, Int. J. Pattern Recognit. Artif. Intell..

[19]  Amanda Clare,et al.  Data Mining the Yeast Genome in a Lazy Functional Language , 2003, PADL.

[20]  Bart Demoen,et al.  Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs , 2011, J. Artif. Intell. Res..

[21]  Celine Vens,et al.  The ACE Data Mining System User's Manual , 2009 .

[22]  Donato Malerba,et al.  Mining spatial association rules in census data , 2002 .

[23]  Anne Laurent,et al.  Mining unexpected multidimensional rules , 2007, DOLAP '07.

[24]  Luc De Raedt,et al.  Clausal Discovery , 1997, Machine Learning.

[25]  Michelangelo Ceci,et al.  Discovery of spatial association rules in geo-referenced census data: A relational mining approach , 2003, Intell. Data Anal..

[26]  Yun Sing Koh,et al.  Finding Sporadic Rules Using Apriori-Inverse , 2005, PAKDD.

[27]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[28]  Luc De Raedt,et al.  Scaling Up Inductive Logic Programming by Learning from Interpretations , 1999, Data Mining and Knowledge Discovery.

[29]  Ranga Raju Vatsavai,et al.  Trends in Spatial Data Mining , 2022 .

[30]  Christos Faloutsos,et al.  Outlier Detection Adaptive to Users' Intentions , 2004 .

[31]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[32]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[33]  Michelangelo Ceci,et al.  Spatial associative classification: propositional vs structural approach , 2006, Journal of Intelligent Information Systems.

[34]  Jiawei Han,et al.  GeoMiner: a system prototype for spatial data mining , 1997, SIGMOD '97.

[35]  Georg Carle,et al.  Traffic Anomaly Detection Using K-Means Clustering , 2007 .

[36]  Anna M. Manning,et al.  On Minimal Infrequent Itemset Mining , 2007, DMIN.

[37]  Jan Ramon,et al.  Clustering and instance based learning in first order logic , 2002, AI Communications.

[38]  Yelena Yesha,et al.  Data Mining: Next Generation Challenges and Future Directions , 2004 .

[39]  Martin Ester,et al.  Efficiently Mining Regional Outliers in Spatial Data , 2007, SSTD.