The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources
暂无分享,去创建一个
With funding from the Sloan Foundation, Boston Area Research Initiative (BARI), and Harvard Dataverse, the Harvard Center for Geographic Analysis (CGA) has developed a big spatio-temporal data visualization platform called the Billion Object Platform or ”BOP”. The goal of the project is to lower barriers for scholars who wish to access large, streaming, spatio-temporal datasets. Since once archived, streaming data gets big fast, and since most GIS systems don’t support interactive visualization of millions of objects, a new platform is needed. Our instance of the BOP is loaded with the latest billion geo-tweets and is fed a real-time stream of about 1 million tweets per day. The CGA has been harvesting and archiving geo-tweets since 2012. Tweets flowing into the BOP are enriched with sentiment and census information to support further analysis. Incoming and intermediate data is streamed/stored in Apache Kafka. The core of the BOP is Apache Solr, which supports fast search. Some significant Solr enhancements were developed (and contributed back) – notably 2D ”heatmap faceting” to support spatial visualization. The BOP fronts Solr with a RESTful web service, which provides a friendly, and secure API that is accessed from a browser-based client. The client dynamically displays temporal and spatial distributions of results for result sets containing hundreds of millions of features. The system is open source and runs on commodity hardware. The geo-tweet instance is hosted on Massachusetts Open Cloud (MOC), an OpenStack environment (OpenStack 2017). All components are deployed in Docker and orchestrated by Kontena. ∗Corresponding author Email address: kakkar@fas.harvard.edu (Devika Kakkar) Submitted to FOSS4G 2017 Conference Proceedings, Boston, USA. September 20, 2017 FOSS4G 2017 Academic Program The Billion Object Platform