EaserGeocoder: integrative geocoding with machine learning (demo paper)

Increased availability of large amounts of address data provides opportunities for data driven studies to improve decision making in business applications and support precision public health with high resolution geolocations. Geocoding large number of addresses is challenging due to high cost and often disclosure of sensitive data to vendors over the Web. Most geocoders take advantage of Web APIs which require sending private addresses over the Internet, which may not be an option for many applications with sensitive data including public health and geo-medicine. Meanwhile, the cost for geocoding massive number of addresses could be high and becomes a major hurdle for many users. To overcome these challenges, we developed an open source on-premise geocoding software EaserGeocoder, which uses a novel integrative geocoding model to achieve high accuracy through integrating multiple open data sources. EaserGeocoder takes advantage of machine learning based approaches to determine best answers from multiple data sources. EaserGeocoder can also be easily parallelized to achieve high scalability through parallelized search and distributed computing. EaserGeocoder is on a par with commercial geocoding systems, outperforms open source systems, and is available for free.