TurboLift: fast accuracy lifting for historical data recovery

Historical data are frequently involved in situations where the available reports on time series are temporally aggregated at different levels, e.g., the monthly counts of people infected with measles. In real databases, the time periods covered by different reports can have overlaps (i.e., time-ticks covered by more than one reports) or gaps (i.e., time-ticks not covered by any report). However, data analysis and machine learning models require reconstructing the historical events in a finer granularity, e.g., the weekly patient counts, for elaborate analysis and prediction. Thus, data disaggregation algorithms are becoming increasingly important in various domains. Time series disaggregation methods commonly utilize domain knowledge about the data, e.g., smoothness, periodicity, or sparsity, to improve the reconstruction accuracy. In this paper, we propose a novel approach, called TurboLift , which aims to improve the quality of the solutions provided by existing disaggregation methods. Starting from a solution produced by a specific method, TurboLift finds a new solution that reduces the disaggregation error and is close to the initial one. We derive a closed-form solution to the proposed formulation of TurboLift that enables us to obtain an accurate reconstruction analytically, without performing resource and time-consuming iterations. Experiments on real data from different domains showcase the effectiveness of TurboLift in terms of disaggregation error, and outlier and anomaly detection.

[1]  Christopher Ré,et al.  SLiMFast: Guaranteed Results for Data Fusion and Source Reliability , 2015, SIGMOD Conference.

[2]  Christos Faloutsos,et al.  Recovering Information from Summary Data , 1997, VLDB.

[3]  Lei Chen,et al.  CrowdFusion: A Crowdsourced Approach on Data Fusion Refinement , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[4]  Ohad Shamir,et al.  Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity , 2015, ICML.

[5]  Alexander G. Gray,et al.  QUIC-SVD: Fast SVD Using Cosine Trees , 2008, NIPS.

[6]  Martin Vetterli,et al.  Annihilating filter-based decoding in the compressed sensing framework , 2007, SPIE Optical Engineering + Applications.

[7]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[8]  Vladimir Zadorozhny,et al.  A systematic approach to reliability assessment in integrated databases , 2015, Journal of Intelligent Information Systems.

[9]  D. L. Hall,et al.  Mathematical Techniques in Multisensor Data Fusion , 1992 .

[10]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[11]  Tommaso Di Fonzo,et al.  The Estimation of M Disaggregate Time Series When Contemporaneous and Temporal Aggregates Are Known , 1990 .

[12]  Nikos D. Sidiropoulos,et al.  HomeRun: Scalable Sparse-Spectrum Reconstruction of Aggregated Historical Data , 2018, Proc. VLDB Endow..

[13]  Mongi A. Abidi,et al.  Data fusion: color edge detection and surface reconstruction through regularization , 1996, IEEE Trans. Ind. Electron..

[14]  Zongge Liu,et al.  H-Fuse: Efficient Fusion of Aggregated Historical Data , 2017, SDM.

[15]  Elena Console,et al.  Data Fusion , 2009, Encyclopedia of Database Systems.

[16]  M. Waller,et al.  Forecasting with Temporally Aggregated Demand Signals in a Retail Supply Chain , 2015 .

[17]  Nicola Rossi A Note on the Estimation of Disaggregate Time Series When the Aggregate Is Known , 1982 .

[18]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[19]  E. Jaynes The well-posed problem , 1973 .

[20]  Gene H. Golub,et al.  Matrix computations , 1983 .

[21]  Vassilis Anastassopoulos,et al.  Super-resolution image reconstruction techniques: Trade-offs between the data-fidelity and regularization terms , 2012, Inf. Fusion.

[22]  Jose Manuel Pavía-Miralles,et al.  A Survey of Methods to Interpolate, Distribute and Extra- polate Time Series , 2010 .

[23]  Petre Stoica,et al.  Introduction to spectral analysis , 1997 .

[24]  Nikos D. Sidiropoulos,et al.  Ares: Automatic Disaggregation of Historical Data , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[25]  G. Chow,et al.  Best Linear Unbiased Interpolation, Distribution, and Extrapolation of Time Series by Related Series , 1971 .

[26]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[27]  Ying Zhu,et al.  Reliable Detection of Overtaking Vehicles Using Robust Information Fusion , 2006, IEEE Transactions on Intelligent Transportation Systems.

[28]  Charilaos I. Kanatsoulis,et al.  PREMA: Principled Tensor Data Recovery from Multiple Aggregated Views , 2019, ArXiv.

[29]  Georg Heinig,et al.  Algebraic Methods for Toeplitz-like Matrices and Operators , 1984 .

[30]  Felix Naumann,et al.  Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[31]  Vladimir Zadorozhny,et al.  Information fusion for USAR operations based on crowdsourcing , 2013, Proceedings of the 16th International Conference on Information Fusion.

[32]  Peter Steiner,et al.  Temporal Disaggregation of Time Series , 2013, R J..

[33]  Jose Manuel Pavía-Miralles,et al.  On estimating contemporaneous quarterly regional GDP , 2007 .

[34]  Shawn T. Brown,et al.  Contagious diseases in the United States from 1888 to the present. , 2013, The New England journal of medicine.