Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets

The increasing ability to collect data from urban environments, coupled with a push towards openness by governments, has resulted in the availability of numerous spatio-temporal data sets covering diverse aspects of a city. Discovering relationships between these data sets can produce new insights by enabling domain experts to not only test but also generate hypotheses. However, discovering these relationships is difficult. First, a relationship between two data sets may occur only at certain locations and/or time periods. Second, the sheer number and size of the data sets, coupled with the diverse spatial and temporal scales at which the data is available, presents computational challenges on all fronts, from indexing and querying to analyzing them. Finally, it is non-trivial to differentiate between meaningful and spurious relationships. To address these challenges, we propose Data Polygamy, a scalable topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets. We have performed an experimental evaluation using over 300 spatial-temporal urban data sets which shows that our approach is scalable and effective at identifying interesting relationships.

[1]  Ulrich Bauer,et al.  Measuring Distance between Reeb Graphs , 2013, SoCG.

[2]  Vijay Natarajan,et al.  An Exploration Framework to Identify and Track Movement of Cloud Systems , 2013, IEEE Transactions on Visualization and Computer Graphics.

[3]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[4]  Herbert Edelsbrunner,et al.  Geometry and Topology for Mesh Generation , 2001, Cambridge monographs on applied and computational mathematics.

[5]  Valerio Pascucci,et al.  Loops in Reeb Graphs of 2-Manifolds , 2003, SCG '03.

[6]  Dan Suciu,et al.  A formal approach to finding explanations for database queries , 2014, SIGMOD Conference.

[7]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[8]  Cláudio T. Silva,et al.  Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips , 2013, IEEE Transactions on Visualization and Computer Graphics.

[9]  Gagan Agrawal,et al.  Supporting correlation analysis on scientific datasets in parallel and distributed settings , 2014, HPDC '14.

[10]  M. D. Ernst Permutation Methods: A Basis for Exact Inference , 2004 .

[11]  David Cohen-Steiner,et al.  Stability of Persistence Diagrams , 2007, Discret. Comput. Geom..

[12]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[13]  H. Farber,et al.  Why You Can&Apos;T Find a Taxi in the Rain and Other Labor Supply Lessons from Cab Drivers , 2014 .

[14]  Jennifer Bradley,et al.  The Metropolitan Revolution: How Cities and Metros are Fixing our Broken Politics and Fragile Economy , 2013 .

[15]  David Maier,et al.  Helping scientists reconnect their datasets , 2014, SSDBM '14.

[16]  Vijay Natarajan,et al.  Distance between extremum graphs , 2015, 2015 IEEE Pacific Visualization Symposium (PacificVis).

[17]  Philip S. Yu,et al.  /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[18]  Valerio Pascucci,et al.  Morse-smale complexes for piecewise linear 3-manifolds , 2003, SCG '03.

[19]  T. Banchoff Critical Points and Curvature for Embedded Polyhedral Surfaces , 1970 .

[20]  J. Besag,et al.  Generalized Monte Carlo significance tests , 1989 .

[21]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[22]  H. Künsch The Jackknife and the Bootstrap for General Stationary Observations , 1989 .

[23]  Jack Snoeyink,et al.  Computing contour trees in all dimensions , 2000, SODA '00.

[24]  Eamonn J. Keogh,et al.  Everything you know about Dynamic Time Warping is Wrong , 2004 .

[25]  Hamish Carr,et al.  Topological Methods in Data Analysis and Visualization III, Theory, Algorithms, and Applications , 2011 .

[26]  Konstantin Mischaikow,et al.  Feature-based surface parameterization and texture mapping , 2005, TOGS.

[27]  Samuel Madden,et al.  Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..

[28]  Hugh Glaser,et al.  Linked Open Government Data: Lessons from Data.gov.uk , 2012, IEEE Intelligent Systems.

[29]  References , 1971 .

[30]  Jesse Freeman,et al.  in Morse theory, , 1999 .

[31]  Theodoros Damoulas,et al.  Mining 911 Calls in New York City: Temporal Patterns, Detection, and Forecasting , 2015, AAAI Workshop: AI for Cities.

[32]  Reynold Xin,et al.  Finding related tables , 2012, SIGMOD Conference.

[33]  Marcos R. Vieira,et al.  Structured Open Urban Data: Understanding the Landscape , 2014, Big Data.

[34]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[35]  Herbert Edelsbrunner,et al.  Geometry and Topology for Mesh Generation , 2001, Cambridge monographs on applied and computational mathematics.

[36]  Divesh Srivastava,et al.  Less is More: Selecting Sources Wisely for Integration , 2012, Proc. VLDB Endow..

[37]  O. J. Dunn,et al.  Applied statistics: analysis of variance and regression , 1975 .

[38]  Herbert Edelsbrunner,et al.  Topological Persistence and Simplification , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[39]  Valerio Pascucci,et al.  Loops in Reeb Graphs of 2-Manifolds , 2004, Discret. Comput. Geom..

[40]  Elke Achtert,et al.  Robust, Complete, and Efficient Correlation Clustering , 2007, SDM.

[41]  Topological Methods in Data Analysis and Visualization , 2011, Mathematics and Visualization.

[42]  Raid Amin,et al.  Applied Statistics: Analysis of Variance and Regression , 2004, Technometrics.