The Internet of Things and fast data streams: prospects for geospatial data science in emerging information ecosystems

ABSTRACT This paper surveys the rapid development of the Internet of Things, the massive data streams that are only now beginning to be generated from it, and the resulting opportunities and challenges that these data streams bring to geographic information analysis. These challenges arise because streaming data volumes cannot bt subjected to analysis using the standard repertoire of methods that have been designed to analyze static geospatial datasets. New approaches are needed, not to supplant, but to supplement, these existing tools. A focus is placed on the concept of data velocity (fast data) and its effects on sampling and inference. Innovative data ingestion strategies based on principles related to reservoir sampling and sketching are described. Dynamic temporal data flows present significant challenges to load balancing in distributed (e.g. cloud) parallel environments, even at exascale levels of performance. Further advances in the exploitation of data locality based on geographical concepts, as well as advanced processing methods based on edge and approximate computing, require further elucidation. Concepts are illustrated using a database compiled from a distributed sensor network of mobile radioactivity detectors.

[1]  Ryan Calo Is the law ready for driverless cars? , 2018, Commun. ACM.

[2]  D. Peuquet It's About Time: A Conceptual Framework for the Representation of Temporal Dynamics in Geographic Information Systems , 1994 .

[3]  Marc P. Armstrong,et al.  Distributed LiDAR data processing in a high-memory cloud-computing environment , 2014, Ann. GIS.

[4]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[5]  Weisong Shi,et al.  The Promise of Edge Computing , 2016, Computer.

[6]  Xavier Vilajosana,et al.  Bootstrapping smart cities through a self-sustainable model based on big data flows , 2013, IEEE Communications Magazine.

[7]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[8]  Stephen Hailes,et al.  A comparison between smartphone sensors and bespoke sensor devices for wheelchair accessibility studies , 2015, 2015 IEEE Tenth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP).

[9]  Shaowen Wang,et al.  CyberGIS software: a synthetic review and integration roadmap , 2013, Int. J. Geogr. Inf. Sci..

[10]  P. Dixon Nondetects and Data Analysis: Statistics for Censored Environmental Data , 2006 .

[11]  Shaowen Wang,et al.  CyberGIS: blueprint for integrated and scalable geospatial software ecosystems , 2013, Int. J. Geogr. Inf. Sci..

[12]  G. Langran,et al.  A Framework For Temporal Geographic Information , 1988 .

[13]  Mark Gahegan,et al.  Is inductive machine learning just another wild goose (or might it lay the golden egg)? , 2003, Int. J. Geogr. Inf. Sci..

[14]  Shaowen Wang,et al.  A quadtree approach to domain decomposition for spatial interpolation in Grid computing environments , 2003, Parallel Comput..

[15]  Mark Deakin,et al.  Smart Cities : Governing, Modelling and Analysing the Transition , 2013 .

[16]  Sherali Zeadally,et al.  Managing Trust in the Cloud: State of the Art and Research Challenges , 2016, Computer.

[17]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[18]  Marc P. Armstrong,et al.  Temporality in Spatial Databases , 1988 .

[19]  S. Chainey,et al.  Mapping Crime: Understanding Hot Spots , 2014 .

[20]  G. Langran Time in Geographic Information Systems , 1990 .

[21]  Chaowei Yang,et al.  Utilizing Cloud Computing to address big geospatial data challenges , 2017, Comput. Environ. Urban Syst..

[22]  Stan Openshaw Developing Automated and Smart Spatial Pattern Exploration Tools for Geographical Information Systems Applications , 1995 .

[23]  Sophie Keller,et al.  Object Oriented Design For Temporal Gis , 2016 .

[24]  Shih-Lung Shaw What about "time" in transportation geography? , 2006 .

[25]  Dimitrios Serpanos,et al.  The Cyber-Physical Systems Revolution , 2018, Computer.

[26]  Steven Weinberg,et al.  To Explain the World: The Discovery of Modern Science , 2015 .

[27]  Christoph Sommer,et al.  Driving for Big Data? Privacy Concerns in Vehicular Networking , 2014, IEEE Security & Privacy.

[28]  Daniel Sui,et al.  Geospatial Big Data , 2022, Encyclopedia of Big Data.

[29]  Brian J. L. Berry,et al.  APPROACHES TO REGIONAL ANALYSIS: A SYNTHESIS , 1964 .

[30]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[31]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[32]  Divesh Srivastava,et al.  Finding hierarchical heavy hitters in streaming data , 2008, TKDD.

[33]  Peng Zhang,et al.  High resolution spatio-temporal monitoring of air pollutants using wireless sensor networks , 2014, 2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP).

[34]  Xiaonan Wang,et al.  Data acquisition in vehicular ad hoc networks , 2018, Commun. ACM.

[35]  Philip S. Yu,et al.  On Clustering Massive Data Streams: A Summarization Paradigm , 2007, Data Streams - Models and Algorithms.

[36]  Philip S. Yu,et al.  A Survey of Synopsis Construction in Data Streams , 2007, Data Streams - Models and Algorithms.

[37]  Martin Mauve,et al.  Information Dissemination in VANETs , 2009, VANET.

[38]  Victor J. D. Tsai,et al.  Delaunay Triangulations in TIN Creation: An Overview and a Linear-Time Algorithm , 1993, Int. J. Geogr. Inf. Sci..

[39]  M. Kwan Space-time and integral measures of individual accessibility: a comparative analysis using a point-based framework , 2010 .

[40]  Francisco Herrera,et al.  Fuzzy nearest neighbor algorithms: Taxonomy, experimental analysis and prospects , 2014, Inf. Sci..

[41]  Melanie Mitchell,et al.  Adaptive computation , 2016, Commun. ACM.

[42]  Shaowen Wang,et al.  A spatial fuzzy influence diagram for modelling spatial objects’ dependencies: a case study on tree-related electric outages , 2018, Int. J. Geogr. Inf. Sci..

[43]  CaloRyan Is the law ready for driverless cars , 2018 .

[44]  Jean-Daniel Fekete,et al.  Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis , 2016, ArXiv.

[45]  David A. Bennett,et al.  An Inductive Knowledge-based Approach to Terrain Feature Extraction , 1996 .

[46]  Donna Peuquet,et al.  An Event-Based Spatiotemporal Data Model (ESTDM) for Temporal Analysis of Geographical Data , 1995, Int. J. Geogr. Inf. Sci..

[47]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[48]  Shaowen Wang,et al.  A CyberGIS-Jupyter Framework for Geospatial Analytics at Scale , 2017, PEARC.

[49]  Neha Narkhede,et al.  Kafka: The definitive guide , 2017 .

[50]  Nikos Mamoulis,et al.  Periodic Pattern Discovery from Trajectories of Moving Objects , 2009 .

[51]  Torsten Hägerstrand,et al.  The Computer and the Geographer , 1967 .

[52]  Mark Gahegan,et al.  On the Application of Inductive Machine Learning Tools to Geographical Analysis , 2010 .

[53]  W. B. Johnston MODELS IN GEOGRAPHY , 1969 .

[54]  Byron Ellis Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data , 2014 .

[55]  Thierry Moreau,et al.  Approximate Computing: Making Mobile Systems More Efficient , 2015, IEEE Pervasive Computing.

[56]  Stefan Zander,et al.  BigGIS: a continuous refinement approach to master heterogeneity and uncertainty in spatio-temporal big data (vision paper) , 2016, SIGSPATIAL/GIS.

[57]  H. R. Miller,et al.  The Data Avalanche is Here: Shouldn’t We Be Digging? , 2010 .

[58]  Shashi Shekhar,et al.  Benchmarking Spatial Big Data , 2012, WBDB.

[59]  J R Beaumont Towards an Integrated Information System for Retail Management , 1989 .

[60]  Geert Wets,et al.  Computational Intelligence for Traffic and Mobility , 2013, Atlantis Computational Intelligence Systems.

[61]  Shaowen Wang CyberGIS and spatial data science , 2016 .

[62]  William B. Lober,et al.  Review Paper: Implementing Syndromic Surveillance: A Practical Guide Informed by the Early Experience , 2003, J. Am. Medical Informatics Assoc..

[63]  Bin Jiang,et al.  Geospatial Big Data Handling Theory and Methods: A Review and Research Challenges , 2015, ArXiv.

[64]  Stan Openshaw Towards a More Computationally Minded Scientific Human Geography , 1998 .

[65]  Shaowen Wang,et al.  A theoretical approach to the use of cyberinfrastructure in geographical analysis , 2009, Int. J. Geogr. Inf. Sci..

[66]  Demin Xiong,et al.  Strategies for Real-Time Spatial Analysis Using Massively Parallel SIMD Cpmputers: An Application to Urban Traffic Flow Analysis , 1996, Int. J. Geogr. Inf. Sci..

[67]  M. Armstrong,et al.  Exploring the Geographic Consequences of Public Policies Using Evolutionary Algorithms , 2004, Annals of the Association of American Geographers.

[68]  F. Vial,et al.  Value of evidence from syndromic surveillance with cumulative evidence from multiple data streams with delayed reporting , 2017, Scientific Reports.

[69]  Elisa Bertino,et al.  Building Sensor-Based Big Data Cyberinfrastructures , 2015, IEEE Cloud Computing.

[70]  John H. Holland,et al.  Induction: Processes of Inference, Learning, and Discovery , 1987, IEEE Expert.

[71]  Sheng Sun,et al.  Interpretations of de-orbit, deactivation, and shutdown guidelines applicable to GEO satellites , 2013, 2013 IEEE Aerospace Conference.

[72]  Beng Chin Ooi,et al.  In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[73]  Kirsi Virrantaus,et al.  A fuzzy multiple-attribute decision-making modelling for vulnerability analysis on the basis of population information for disaster management , 2014, Int. J. Geogr. Inf. Sci..

[74]  Sean Bonner,et al.  Safecast: successful citizen-science for radiation measurement and communication after Fukushima , 2016, Journal of radiological protection : official journal of the Society for Radiological Protection.

[75]  M. Goodchild,et al.  International Encyclopedia of Geography: People, the Earth, Environment, and Technology , 2017 .

[76]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[77]  Torsten Hägerstraand WHAT ABOUT PEOPLE IN REGIONAL SCIENCE , 1970 .

[78]  Francis daCosta Rethinking the Internet of Things: A Scalable Approach to Connecting Everything , 2014 .

[79]  Jiawei Han,et al.  Geographic data mining and knowledge discovery: An overview , 2009 .

[80]  Ivan Stojmenovic,et al.  Wireless Sensor and Actuator Networks: Algorithms and Protocols for Scalable Coordination and Data Communication , 2010 .

[81]  Helwig Hauser,et al.  Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis , 2017, IEEE Transactions on Visualization and Computer Graphics.

[82]  Anind K. Dey,et al.  Toward Building a Safe, Secure, and Easy-to-Use Internet of Things Infrastructure , 2016, Computer.

[83]  Jianya Gong,et al.  ParaStream: A parallel streaming Delaunay triangulation algorithm for LiDAR points on multicore architectures , 2011, Comput. Geosci..

[84]  H. Miller A MEASUREMENT THEORY FOR TIME GEOGRAPHY , 2005 .

[85]  Peter J. Denning,et al.  Exponential laws of computing growth , 2016, Commun. ACM.

[86]  Ronitt Rubinfeld,et al.  Sublinear Time Algorithms , 2011, SIAM J. Discret. Math..

[87]  Logan Kugler Is "good enough" computing good enough? , 2015, Commun. ACM.

[88]  G. Box Science and Statistics , 1976 .

[89]  D. Butler Many eyes on Earth , 2014, Nature.

[90]  Mohammad Ilyas,et al.  Sensor Networks for Sustainable Development , 2014 .

[91]  Marc P. Armstrong,et al.  DOMAIN DECOMPOSITION FOR PARALLEL PROCESSING OF SPATIAL PROBLEMS , 1992 .

[92]  Marc P. Armstrong,et al.  Geography and Computational Science , 2000 .

[93]  Keith Kirkpatrick The moral challenges of driverless cars , 2015, Commun. ACM.

[94]  Trina S. Myers,et al.  Sensors in heat: A pilot study for high resolution urban sensing in an integrated streetlight platform , 2015, 2015 IEEE Tenth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP).

[95]  Patrick Laube Decentralized spatial data mining for geosensor networks , 2007 .

[96]  Shaowen Wang,et al.  Parallelizing MCMC for Bayesian spatiotemporal geostatistical models , 2007, Stat. Comput..

[97]  Chaowei Phil Yang,et al.  Redefining the possibility of digital Earth and geosciences with spatial cloud computing , 2013, Int. J. Digit. Earth.

[98]  Harvey J. Miller,et al.  Modelling accessibility using space-time prism concepts within geographical information systems , 1991, Int. J. Geogr. Inf. Sci..

[99]  P. Torrens Geography and computational social science , 2010 .

[100]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[101]  May Yuan Toward Knowledge Discovery about Geographic Dynamics in Spatiotemporal Databases , 2009 .

[102]  Nigel Thrift,et al.  An introduction to time-geography , 1977 .

[103]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[104]  Gregory Mone,et al.  The new smart cities , 2015, Commun. ACM.

[105]  Mark Gahegan,et al.  The case for inductive and visual techniques in the analysis of spatial data , 2000, J. Geogr. Syst..

[106]  J. Holland,et al.  Adaptive Computation : The Multidisciplinary Legacy of , 2018 .

[107]  I. S. Lowry A Short Course in Model Design , 1965 .

[108]  Michael F. Worboys,et al.  A Unified Model for Spatial and Temporal Information , 1994, Comput. J..

[109]  M. Porter,et al.  How Smart, Connected Products Are Transforming Competition , 2014 .

[110]  May Yuan,et al.  Computation and visualization for understanding dynamics in geographic domains - a research agenda , 2007 .