Transdisciplinary Foundations of Geospatial Data Science

Recent developments in data mining and machine learning approaches have brought lots of excitement in providing solutions for challenging tasks (e.g., computer vision). However, many approaches have limited interpretability, so their success and failure modes are difficult to understand and their scientific robustness is difficult to evaluate. Thus, there is an urgent need for better understanding of the scientific reasoning behind data mining and machine learning approaches. This requires taking a transdisciplinary view of data science and recognizing its foundations in mathematics, statistics, and computer science. Focusing on the geospatial domain, we apply this crucial transdisciplinary perspective to five common geospatial techniques (hotspot detection, colocation detection, prediction, outlier detection and teleconnection detection). We also describe challenges and opportunities for future advancement.

[1]  Guan Yuan,et al.  Trajectory Outlier Detection Algorithm Based on Structural Features , 2011 .

[2]  M. Wall A close look at the spatial structure implied by the CAR and SAR models , 2004 .

[3]  Ye Zhang,et al.  An Integrated GIS, optimization and simulation framework for optimal PV size and location in campus area environments , 2014 .

[4]  James P. LeSage,et al.  Closed-Form Maximum Likelihood Estimates for Spatial Problems , 2000 .

[5]  Shashi Shekhar,et al.  Capacity-Constrained Network-Voronoi Diagram , 2015, IEEE Transactions on Knowledge and Data Engineering.

[6]  Noel A Cressie,et al.  Statistics for Spatio-Temporal Data , 2011 .

[7]  Shashi Shekhar,et al.  Comparing Exact and Approximate Spatial Auto-regression Model Solutions for Spatial Data Analysis , 2004, GIScience.

[8]  S. McLafferty,et al.  GIS and Public Health , 2002 .

[9]  Noel A. C. Cressie,et al.  Statistics for Spatial Data: Cressie/Statistics , 1993 .

[10]  Shashi Shekhar,et al.  Focal-Test-Based Spatial Decision Tree Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[11]  Christoph F. Eick,et al.  Finding regional co-location patterns for sets of continuous variables in spatial datasets , 2008, GIS '08.

[12]  D. Altman,et al.  Multiple significance tests: the Bonferroni method , 1995, BMJ.

[13]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[14]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[15]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[16]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[17]  Song Wang,et al.  Regional Co-locations of Arbitrary Shapes , 2013, SSTD.

[18]  Andrew W. Moore,et al.  A Bayesian Spatial Scan Statistic , 2005, NIPS.

[19]  Shashi Shekhar,et al.  Mining Network Hotspots with Holes: A Summary of Results , 2016, GIScience.

[20]  Shashi Shekhar,et al.  A Parallel Formulation of the Spatial Auto-Regression Model for Mining Large GeoSpatial Datasets , 2004 .

[21]  Shashi Shekhar,et al.  Discovering interesting sub-paths in spatiotemporal datasets: a summary of results , 2011, GIS.

[22]  Shashi Shekhar,et al.  Geospatial data science: A transdisciplinary approach , 2017 .

[23]  Suzana Dragicevic,et al.  Collaborative Geographic Information Systems and Science: A Transdisciplinary Evolution , 2006 .

[24]  Shashi Shekhar,et al.  Spatially Constrained Geodesign Optimization (GOP) for Improving Agricultural Watershed Sustainability , 2017, AAAI Workshops.

[25]  Yu Zheng,et al.  Trajectory Data Mining , 2015, ACM Trans. Intell. Syst. Technol..

[26]  Sabine Timpf,et al.  Trajectory data mining: A review of methods and applications , 2016, J. Spatial Inf. Sci..

[27]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[28]  Min Deng,et al.  Multi-level method for discovery of regional co-location patterns , 2017, Int. J. Geogr. Inf. Sci..

[29]  M. Kulldorff A spatial scan statistic , 1997 .

[30]  Raghavan Srinivasan,et al.  Approximating SWAT Model Using Artificial Neural Network and Support Vector Machine 1 , 2009 .

[31]  Mete Celik,et al.  Spatial AutoRegression (SAR) Model , 2012, SpringerBriefs in Computer Science.

[32]  P. Dixon Ripley's K Function , 2006 .

[33]  Weili Wu,et al.  Spatial contextual classification and prediction models for mining geospatial data , 2002, IEEE Trans. Multim..

[34]  Hui Xiong,et al.  A Framework for Discovering Co-Location Patterns in Data Sets with Extended Spatial Objects , 2004, SDM.

[35]  Hailemariam Temesgen,et al.  A Comparison of Selected Parametric and Non-Parametric Imputation Methods for Estimating Forest Biomass and Basal Area , 2014 .

[36]  Yan Huang,et al.  Exploiting Spatial Autocorrelation to Efficiently Process Correlation-Based Similarity Queries , 2003, SSTD.

[37]  R. Pace,et al.  Closed‐Form Maximum Likelihood Estimates of Nearest Neighbor Spatial Dependence , 2010 .

[38]  Shashi Shekhar,et al.  A Lagrangian approach for storage of spatio-temporal network datasets: a summary of results , 2010, GIS '10.

[39]  Shaowen Wang CyberGIS and spatial data science , 2016 .

[40]  Vipin Kumar,et al.  Testing the significance of spatio-temporal teleconnection patterns , 2012, KDD.

[41]  Shashi Shekhar,et al.  Ring-Shaped Hotspot Detection: A Summary of Results , 2014, 2014 IEEE International Conference on Data Mining.

[42]  Chang-Tien Lu,et al.  On detecting spatial categorical outliers , 2014, GeoInformatica.

[43]  Noel A Cressie,et al.  Spatial Mixture Models Based on Exponential Family Conditional Distributions , 2000 .

[44]  Chang-Tien Lu,et al.  On Detecting Spatial Outliers , 2008, GeoInformatica.

[45]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[46]  Edzer J. Pebesma,et al.  TGRASS: A temporal GIS for field based environmental modeling , 2014, Environ. Model. Softw..

[47]  Shashi Shekhar,et al.  A Unified Approach to Detecting Spatial Outliers , 2003, GeoInformatica.

[48]  Jacob Cohen Statistical Power Analysis , 1992 .

[49]  Jae-Gil Lee,et al.  Traffic Density-Based Discovery of Hot Routes in Road Networks , 2007, SSTD.

[50]  James M. Kang,et al.  Tipping Points, Butterflies, and Black Swans: A Vision for Spatio-temporal Data Mining Analysis , 2011, SSTD.

[51]  R. J. Martin Approximations to the determinant term in gaussian maximum likelihood estimation of some spatial models , 1992 .

[52]  Shaowen Wang,et al.  Parallel Processing over Spatial-Temporal Datasets from Geo, Bio, Climate and Social Science Communities: A Research Roadmap , 2017, 2017 IEEE International Congress on Big Data (BigData Congress).

[53]  Hui Xiong,et al.  Discovering colocation patterns from spatial data sets: a general approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[54]  Feng Qian,et al.  Mining Spatial Co-location Patterns with Dynamic Neighborhood Constraint , 2009, ECML/PKDD.

[55]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[56]  Jin Soung Yoo,et al.  A Parallel Spatial Co-location Mining Algorithm Based on MapReduce , 2014, 2014 IEEE International Congress on Big Data.

[57]  Shashi Shekhar,et al.  Spatiotemporal change footprint pattern discovery: an inter‐disciplinary survey , 2014, WIREs Data Mining Knowl. Discov..

[58]  Shashi Shekhar,et al.  Significant Linear Hotspot Discovery , 2017, IEEE Transactions on Big Data.

[59]  Shashi Shekhar,et al.  NORTHSTAR: A Parameter Estimation Method for the Spatial Autoregression Model , 2007 .

[60]  Jon Atli Benediktsson,et al.  A spatial-spectral kernel-based approach for the classification of remote-sensing images , 2012, Pattern Recognit..

[61]  D. Lilja,et al.  Spatial Dependency Modeling Using Spatial Auto-Regression , 2006 .

[62]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[63]  Martin Skutella,et al.  Time-Expanded Graphs for Flow-Dependent Transit Times , 2002, ESA.

[64]  Shashi Shekhar,et al.  Ring-Shaped Hotspot Detection , 2016, IEEE Trans. Knowl. Data Eng..

[65]  WILLIAM F. FAGAN,et al.  Integrating Edge Detection and Dynamic Modeling in Quantitative Analyses of Ecological Boundaries , 2003 .

[66]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[67]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[68]  Sanjay Chawla,et al.  A Scalable Approach for LRT Computation in GPGPU Environments , 2013, APWeb.

[69]  Shashi Shekhar,et al.  FF-SA: Fragmentation-Free Spatial Allocation , 2017, SSTD.

[70]  A. Craft,et al.  INVESTIGATION OF LEUKAEMIA CLUSTERS BY USE OF A GEOGRAPHICAL ANALYSIS MACHINE , 1988, The Lancet.

[71]  Graham J. Wills,et al.  Dynamic Graphics for Exploring Spatial Data with Application to Locating Global and Local Anomalies , 1991 .

[72]  Shih-Lung Shaw,et al.  Exploring potential human activities in physical and virtual spaces: a spatio‐temporal GIS approach , 2008, Int. J. Geogr. Inf. Sci..

[73]  M Kulldorff,et al.  Spatial disease clusters: detection and inference. , 1995, Statistics in medicine.

[74]  Larry Wasserman Rise of the Machines , 2013 .

[75]  Daniel A. Griffith,et al.  A linear regression solution to the spatial autocorrelation problem , 2000, J. Geogr. Syst..

[76]  Jin Soung Yoo,et al.  Mining spatial colocation patterns: a different framework , 2011, Data Mining and Knowledge Discovery.

[77]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[78]  Andrew W. Moore,et al.  Rapid detection of significant spatial clusters , 2004, KDD.

[79]  Shashi Shekhar,et al.  Cascading Spatio-Temporal Pattern Discovery , 2012, IEEE Transactions on Knowledge and Data Engineering.

[80]  James P. LeSage,et al.  Chebyshev approximation of log-determinants of spatial weight matrices , 2004, Comput. Stat. Data Anal..

[81]  Yan Huang,et al.  Correlation Analysis of Spatial Time Series Datasets: A Filter-and-Refine Approach , 2003, PAKDD.

[82]  Shashi Shekhar,et al.  Discovering Non-compliant Window Co-Occurrence Patterns: A Summary of Results , 2015, SSTD.

[83]  Rashmin Gunasekera Use of GIS for environmental impact assessment: an interdisciplinary approach , 2004 .

[84]  S. Fotheringham,et al.  Geographically Weighted Regression , 1998 .

[85]  Thomas Blaschke,et al.  Geographic information science as a multidisciplinary and multiparadigmatic field , 2014 .

[86]  Jae-Gil Lee,et al.  Trajectory Outlier Detection: A Partition-and-Detect Framework , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[87]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[88]  Jörg Sander,et al.  Mining Statistically Significant Co-location and Segregation Patterns , 2014, IEEE Transactions on Knowledge and Data Engineering.

[89]  Davide Castelvecchi,et al.  Can we open the black box of AI? , 2016, Nature.

[90]  Shashi Shekhar,et al.  Discovering Flow Anomalies: A SWEET Approach , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[91]  Wang Tao,et al.  Interdisciplinary urban GIS for smart cities: advancements and opportunities , 2013, Geo spatial Inf. Sci..

[92]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[93]  Shashi Shekhar,et al.  A Joinless Approach for Mining Spatial Colocation Patterns , 2006, IEEE Transactions on Knowledge and Data Engineering.

[94]  M. Baker Statisticians issue warning over misuse of P values , 2016, Nature.

[95]  J. M. Van Der Knijff,et al.  Please Scroll down for Article International Journal of Geographical Information Science Lisflood: a Gis-based Distributed Model for River Basin Scale Water Balance and Flood Simulation(2008)'lisflood: a Gis-based Distributed Model for River Basin Scale Water Balance and Flood Simulation',internatio , 2022 .

[96]  M. Kulldorff Spatial Scan Statistics: Models, Calculations, and Applications , 1999 .

[97]  Shashi Shekhar,et al.  Collaborative Geodesign and Spatial Optimization for Fragmentation-Free Land Allocation , 2017, ISPRS Int. J. Geo Inf..

[98]  Vipin Kumar,et al.  Discovering Dynamic Dipoles in Climate Data , 2011, SDM.

[99]  Jeffrey G. Arnold,et al.  The Soil and Water Assessment Tool: Historical Development, Applications, and Future Research Directions , 2007 .

[100]  Shashi Shekhar,et al.  Spatiotemporal Data Mining: A Computational Perspective , 2015, ISPRS Int. J. Geo Inf..

[101]  Shaojie Qiao,et al.  An efficient outlying trajectories mining approach based on relative distance , 2012, Int. J. Geogr. Inf. Sci..

[102]  Yan Huang,et al.  Discovering Spatial Co-location Patterns: A Summary of Results , 2001, SSTD.

[103]  Andrew W. Moore,et al.  Detection of emerging space-time clusters , 2005, KDD '05.

[104]  Shi An,et al.  Detecting Traffic Anomalies in Urban Areas Using Taxi GPS Data , 2015 .