Plenario: An Open Data Discovery and Exploration Platform for Urban Science

The past decade has seen the widespread release of open data concerning city services, conditions, and activities by government bodies and public institutions of all sizes. Hundreds of open data portals now host thousands of datasets of many different types. These new data sources represent enormous potential for improved understanding of urban dynamics and processes—and, ultimately, for more livable, efficient, and prosperous communities. However, those who seek to realize this potential quickly discover that discovering and applying those data relevant to any particular question can be extraordinarily difficult, due to decentralized storage, heterogeneous formats, and poor documentation. In this context, we introduce Plenario, a platform designed to automating time-consuming tasks associated with the discovery, exploration, and application of open city data—and, in so doing, reduce barriers to data use for researchers, policymakers, service providers, journalists, and members of the general public. Key innovations include a geospatial data warehouse that allows data from many sources to be registered into a common spatial and temporal frame; simple and intuitive interfaces that permit rapid discovery and exploration of data subsets pertaining to a particular area and time, regardless of type and source; easy export of such data subsets for further analysis; a user-configurable data ingest framework for automated importing and periodic updating of new datasets into the data warehouse; cloud hosting for elastic scaling and rapid creation of new Plenario instances; and an open source implementation to enable community contributions. We describe here the architecture and implementation of the Plenario platform, discuss lessons learned from its use by several communities, and outline plans for future work. Copyright 2014 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering

[1]  Cristian Ungureanu,et al.  Revisiting storage for smartphones , 2012, TOS.

[2]  C. Mohan,et al.  Disk read-write optimizations and data integrity in transaction systems using write-ahead logging , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[3]  Ken Sexton,et al.  Modifiable Areal Unit Problem (MAUP) , 2008 .

[4]  Dhabaleswar K. Panda,et al.  Beyond block I/O: Rethinking traditional storage primitives , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[5]  Georges Voronoi Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Deuxième mémoire. Recherches sur les parallélloèdres primitifs. , 1908 .

[6]  Lidong Zhou,et al.  Transactional Flash , 2008, OSDI.

[7]  Sang-Won Lee,et al.  X-FTL: transactional FTL for SQLite databases , 2013, SIGMOD '13.

[8]  P. Rogerson,et al.  The Sage handbook of spatial analysis , 2009 .

[9]  Raymond A. Lorie,et al.  Physical integrity in a large segmented database , 1977, TODS.

[10]  Sunhwa Park,et al.  Atomic write FTL for robust flash file system , 2005, Proceedings of the Ninth International Symposium on Consumer Electronics, 2005. (ISCE 2005)..

[11]  David W. S. Wong The Modifiable Areal Unit Problem (MAUP) , 2004 .

[12]  Suprio Ray,et al.  Jackpine: A benchmark to evaluate spatial database performance , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[13]  Youjip Won,et al.  Smart layers and dumb result: IO characterization of an android-based smartphone , 2012, EMSOFT '12.

[14]  Natasa Z. Veljkovic,et al.  Platforms for open government data , 2011, 2011 19thTelecommunications Forum (TELFOR) Proceedings of Papers.

[15]  Ian T. Foster,et al.  Benchmarking cloud-based tagging services , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[16]  Sang-Won Lee,et al.  A log buffer-based flash translation layer using fully-associative sector translation , 2007, TECS.