Nanocubes for Real-Time Exploration of Spatiotemporal Datasets

Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop's main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.

[1]  Jayant Madhavan,et al.  Efficient spatial sampling of large geographical tables , 2012, SIGMOD Conference.

[2]  Hanan Samet,et al.  Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling) , 2005 .

[3]  Jeffrey Heer,et al.  imMens: Real‐time Visual Querying of Big Data , 2013, Comput. Graph. Forum.

[4]  Pat Hanrahan,et al.  Maintaining interactivity while exploring massive time series , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[5]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[6]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[7]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[8]  Niklas Elmqvist,et al.  Towards Utilizing GPUs in Information Visualization: A Model and Implementation of Image-Space Operations , 2009, IEEE Transactions on Visualization and Computer Graphics.

[9]  Doron Rotem,et al.  Simple Random Sampling from Relational Databases , 1986, VLDB.

[10]  Pat Hanrahan,et al.  Show Me: Automatic Presentation for Visual Analysis , 2007, IEEE Transactions on Visualization and Computer Graphics.

[11]  Monica M. C. Schraefel,et al.  Trust me, i'm partially right: incremental visualization lets analysts explore large datasets faster , 2012, CHI.

[12]  Pat Hanrahan,et al.  Multiscale Visualization Using Data Cubes , 2003, IEEE Trans. Vis. Comput. Graph..

[13]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[14]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[15]  Jock D. Mackinlay,et al.  Automating the design of graphical presentations of relational information , 1986, TOGS.

[16]  Matthew O. Ward,et al.  Exploring N-dimensional databases , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[17]  HeerJeffrey,et al.  D3 Data-Driven Documents , 2011 .

[18]  Jeffrey Heer,et al.  Profiler: integrated statistical analysis and visualization for data quality assessment , 2012, AVI.

[19]  Matthew O. Ward,et al.  Measuring Data Abstraction Quality in Multiresolution Visualizations , 2006, IEEE Transactions on Visualization and Computer Graphics.

[20]  Jean-Daniel Fekete,et al.  Interactive information visualization of a million items , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[21]  Sam Ruby,et al.  RESTful Web Services , 2007 .

[22]  Franklin C. Crow,et al.  Summed-area tables for texture mapping , 1984, SIGGRAPH.

[23]  Patrick Weber,et al.  OpenStreetMap: User-Generated Street Maps , 2008, IEEE Pervasive Computing.

[24]  Pat Hanrahan,et al.  Query, analysis, and visualization of hierarchically structured data using Polaris , 2002, KDD.

[25]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[26]  Jean-Daniel Fekete,et al.  Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[27]  Yannis Sismanis,et al.  Dwarf: shrinking the PetaCube , 2002, SIGMOD '02.

[28]  Alan J. Dix,et al.  Statistical , 2018, The War of Words.

[29]  Hadley Wickham ASA 2009 Data Expo , 2011 .

[30]  Yannis Sismanis,et al.  Hierarchical dwarfs for the rollup cube , 2003, DOLAP '03.

[31]  Daniel B. Carr,et al.  Scatterplot matrix techniques for large N , 1986 .

[32]  Daniel A. Keim,et al.  Designing Pixel-Oriented Visualization Techniques: Theory and Applications , 2000, IEEE Trans. Vis. Comput. Graph..