Geospatial data conflation: a formal approach based on optimization and relational databases

ABSTRACT Geospatial data conflation is aimed at matching counterpart features from two or more data sources in order to combine and better utilize information in the data. Due to the importance of conflation in spatial analysis, different approaches to the conflation problem have been proposed ranging from simple buffer-based methods to probability and optimization based models. In this paper, I propose a formal framework for conflation that integrates two powerful tools of geospatial computation: optimization and relational databases. I discuss the connection between the relational database theory and conflation, and demonstrate how the conflation process can be formulated and carried out in standard relational databases. I also propose a set of new optimization models that can be used inside relational databases to solve the conflation problem. The optimization models are based on the minimum cost circulation problem in operations research (also known as the network flow problem), which generalizes existing optimal conflation models that are primarily based on the assignment problem. Using comparable datasets, computational experiments show that the proposed conflation method is effective and outperforms existing optimal conflation models by a large margin. Given its generality, the new method may be applicable to other data types and conflation problems.

[1]  James M. Keller,et al.  Automated Geospatial Conflation of Vector Road Maps to High Resolution Imagery , 2009, IEEE Transactions on Image Processing.

[2]  Xiaohua Tong,et al.  A linear road object matching method for conflation based on optimization and logistic regression , 2014, Int. J. Geogr. Inf. Sci..

[3]  Remco C. Veltkamp,et al.  State of the Art in Shape Matching , 2001, Principles of Visual Information Retrieval.

[4]  Mario A. Nascimento,et al.  K-Closest Pairs Queries in Road Networks , 2016, 2016 17th IEEE International Conference on Mobile Data Management (MDM).

[5]  Volker Walter,et al.  Matching spatial data sets: a statistical approach , 1999, Int. J. Geogr. Inf. Sci..

[6]  M. Goodchild,et al.  An optimisation model for linear feature matching in geographical data conflation , 2011 .

[7]  John N Brown,et al.  AUTOMATED GIS CONFLATION: COVERAGE UPDATE PROBLEMS AND SOLUTIONS. , 1995 .

[8]  Hanan Samet,et al.  Spatial join techniques , 2007, TODS.

[9]  Pascal Neis,et al.  Quality assessment for building footprints data on OpenStreetMap , 2014, Int. J. Geogr. Inf. Sci..

[10]  Wenzhong Shi,et al.  A probability-based multi-measure feature matching method in map conflation , 2009 .

[11]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[12]  Francisco Javier Ariza-López,et al.  Digital map conflation: a review of the process and a proposal for classification , 2011, Int. J. Geogr. Inf. Sci..

[13]  Alan Saalfeld,et al.  Conflation Automated map compilation , 1988, Int. J. Geogr. Inf. Sci..

[14]  Tinghua Ai,et al.  Pattern classification approaches to matching building polygons at multiple scales , 2012 .

[15]  Thomas Devogele,et al.  Matching Networks with Different Levels of Detail , 2008, GeoInformatica.

[16]  Helmut Alt,et al.  Computing the Fréchet distance between two polygonal curves , 1995, Int. J. Comput. Geom. Appl..

[17]  Frederick E. Petry,et al.  A Rule-based Approach for the Conflation of Attributed Vector Data , 1998, GeoInformatica.

[18]  Francisco Javier Ariza-López,et al.  A Survey of Measures and Methods for Matching Geospatial Vector Datasets , 2016, ACM Comput. Surv..

[19]  Francisco Javier Ariza-López,et al.  Automatic positional accuracy assessment of geospatial databases using line-based methods , 2013 .

[20]  Alexander Zipf,et al.  Graph-Based Matching of Points-of-Interest from Collaborative Geo-Datasets , 2018, ISPRS Int. J. Geo Inf..

[21]  Catriel Beeri,et al.  Object Fusion in Geographic Information Systems , 2004, VLDB.

[22]  Peter van Oosterom,et al.  Map integration—update propagation in a multi-source environment , 1997, GIS '97.

[23]  MICHAEL F. GOODCHILD,et al.  A Simple Positional Accuracy Measure for Linear Features , 1997, Int. J. Geogr. Inf. Sci..

[24]  K. Janowicz,et al.  A weighted multi-attribute method for matching user-generated Points of Interest , 2014 .

[25]  Atsushi Masuyama Methods for detecting apparent differences between spatial tessellations at different time points , 2006, Int. J. Geogr. Inf. Sci..

[26]  Bisheng Yang,et al.  A probabilistic relaxation approach for matching road networks , 2013, Int. J. Geogr. Inf. Sci..