Exploiting online sources to accurately geocode addresses

Many Geographic Information System (GIS) applications require the conversion of an address to geographic coordinates. This process is called geocoding. The traditional geocoding method uses a street vector data source, such as, Tigerlines, to obtain address range and coordinates of the street segment on which the given address is located. Next, an approximation technique is used to estimate the location of the given address using the address range of the selected street segment. However, this provides inaccurate results since the approximation assumes that properties exist at all possible addresses and all properties are of equal size. To address the inaccuracy of the traditional geocoding approach, we propose two new methods for geocoding using additional online data sources. The first method, the uniform-lot-size method, uses the number of addresses/lots present on the street segment to approximate the location of an address. The second method, the actual-lot-size method, takes into consideration the lot sizes on the street segment and the orientation of the lots as well. Moreover, we describe an implementation of these methods using an information mediator to obtain information about actual number of lots and sizes of the lots on the streets from various property tax web sites. We geocoded an area covering 13 blocks (267 addresses) using all three methods. Our evaluation shows that the traditional method results in an average error of 36.85 meters, while the uniform-lot-size and the actual-lot-size methods result in the average error of 7.87 meters and 1.63 meters, respectively.

[1]  R. Sinnott Virtues of the Haversine , 1984 .

[2]  Alan Saalfeld,et al.  Conflation Automated map compilation , 1988, Int. J. Geogr. Inf. Sci..

[3]  Jennifer Widom,et al.  Integrating and Accessing Heterogeneous Information Sources in TSIMMIS , 1994 .

[4]  Joann J. Ordille,et al.  Query-Answerin orithms for Information A , 1996 .

[5]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[6]  Michael R. Genesereth,et al.  Query planning and optimization in information integration , 1997 .

[7]  Michael R. Genesereth,et al.  Infomaster: an information integration system , 1997, SIGMOD '97.

[8]  Craig A. Knoblock,et al.  Modeling Web Sources for Information Integration , 1998, AAAI/IAAI.

[9]  Chaitanya K. Baru,et al.  Integrating GIS and Imagery Through XML-Based Information Mediation , 1999, Integrated Spatial Databases.

[10]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[11]  Jerry H. Ratcliffe,et al.  On the accuracy of TIGER-type geocoded address data in relation to cadastral and census areal units , 2001, Int. J. Geogr. Inf. Sci..

[12]  J W Hogan,et al.  On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research. , 2001, American journal of public health.

[13]  Alon Y. Levy Logic-based techniques in data integration , 2001 .

[14]  Craig A. Knoblock,et al.  The Ariadne Approach to Web-Based Information Integration , 2001, Int. J. Cooperative Inf. Syst..

[15]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[16]  Craig A. Knoblock,et al.  Automatically Annotating and Integrating Spatial Datasets , 2003, SSTD.

[17]  Thomas O Talbot,et al.  Positional error in automated geocoding of residential addresses , 2003, International journal of health geographics.

[18]  Craig A. Knoblock,et al.  Efficient Execution of Recursive Integration Plans , 2003, IIWeb.

[19]  Craig A. Knoblock,et al.  Automatically and Accurately Conflating Satellite Imagery and Maps , 2003 .

[20]  Craig A. Knoblock,et al.  A Data Integration Approach to Automatically Composing and Optimizing Web Services , 2004 .

[21]  Craig A. Knoblock,et al.  Exploiting Secondary Sources for Unsupervised Record Linkage , 2004 .

[22]  Craig A. Knoblock,et al.  Automatically and accurately conflating orthoimagery and street maps , 2004, GIS '04.