A New Tidy Data Structure to Support Exploration and Modeling of Temporal Data

Abstract Mining temporal data for information is often inhibited by a multitude of formats: regular or irregular time intervals, point events that need aggregating, multiple observational units or repeated measurements on multiple individuals, and heterogeneous data types. This work presents a cohesive and conceptual framework for organizing and manipulating temporal data, which in turn flows into visualization, modeling, and forecasting routines. Tidy data principles are extended to temporal data by: (1) mapping the semantics of a dataset into its physical layout; (2) including an explicitly declared “index” variable representing time; (3) incorporating a “key” comprising single or multiple variables to uniquely identify units over time. This tidy data representation most naturally supports thinking of operations on the data as building blocks, forming part of a “data pipeline” in time-based contexts. A sound data pipeline facilitates a fluent workflow for analyzing temporal data. The infrastructure of tidy temporal data has been implemented in the R package, called tsibble. Supplementary materials for this article are available online.

[1]  Heike Hofmann,et al.  The plumbing of interactive graphics , 2009, Comput. Stat..

[2]  Duncan Temple Lang,et al.  GGobi: evolving from XGobi into an extensible framework for interactive data visualization , 2003, Comput. Stat. Data Anal..

[3]  Leland Wilkinson,et al.  The Grammar of Graphics (Statistics and Computing) , 2005 .

[4]  Andreas Buja,et al.  XGobi: Interactive Dynamic Data Visualization in the X Window System , 1998 .

[5]  Julie A. Dickerson,et al.  Orca: A Visualization Toolkit for High-Dimensional Data , 2000 .

[6]  Hadley Wickham,et al.  R for Data Science , 2014 .

[7]  Heike Hofmann,et al.  Reactive Programming for Interactive Graphics , 2014, ArXiv.

[8]  Hadley Wickham,et al.  Dates and Times Made Easy with lubridate , 2011 .

[9]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[10]  Ralph Kimball,et al.  The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data , 2004 .

[11]  A. Zeileis,et al.  zoo: S3 Infrastructure for Regular and Irregular Time Series , 2005, math/0505527.

[12]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[13]  Mitchell Wand,et al.  Essentials of Programming Languages, 3rd Edition , 2008 .

[14]  Yves Croissant,et al.  Panel data econometrics in R: The plm package , 2008 .

[15]  B. A. Tague,et al.  UNIX time-sharing system: Foreword , 1978, The Bell System Technical Journal.

[16]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[17]  Edzer Pebesma,et al.  Simple Features for R: Standardized Support for Spatial Vector Data , 2018, R J..

[18]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[19]  Nicholas J Tierney,et al.  Expanding Tidy Data Principles to Facilitate Missing Data Exploration, Visualization and Assessment of Imputations , 2018, J. Stat. Softw..

[20]  Rob J Hyndman,et al.  Calendar-Based Graphics for Visualizing People’s Daily Schedules , 2018, Journal of Computational and Graphical Statistics.