Ares: Automatic Disaggregation of Historical Data

We address the challenge of reconstructing historical counts from aggregated, possibly overlapping historical reports. For example, given the monthly and weekly sums, how can we find the daily counts of people infected with flu? We propose an approach, called ARES (Automatic REStoration), that performs automatic data reconstruction in two phases: (1) first, it estimates the sequence of historical counts utilizing domain knowledge, such as smoothness and periodicity of historical events; (2) then, it uses the estimated sequence to learn notable patterns in the target sequence to refine the reconstructed time series. In order to derive such patterns, ARES uses an annihilating filter technique. The idea is to learn a linear shift-invariant operator whose response to the desired sequence is (approximately) zero-yielding a set of null-space equations that the desired signal should satisfy, without the need for the accompanying data. The reconstruction accuracy can be further improved by applying the second phase iteratively. We evaluate ARES on the real epidemiological data from the Tycho project and demonstrate that ARES recovers historical data from aggregated reports with high accuracy. In particular, it considerably outperforms top competitors, including least squares approximation and the more advanced H-FUSE method (42% and 34% improvement based on average RMSE, respectively).

[1]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[2]  A. Jain Fast inversion of banded Toeplitz matrices by circular decompositions , 1978 .

[3]  Martin Vetterli,et al.  Annihilating filter-based decoding in the compressed sensing framework , 2007, SPIE Optical Engineering + Applications.

[4]  Vassilis Anastassopoulos,et al.  Super-resolution image reconstruction techniques: Trade-offs between the data-fidelity and regularization terms , 2012, Inf. Fusion.

[5]  Vasilis Efthymiou,et al.  Entity resolution in the web of data , 2013, Entity Resolution in the Web of Data.

[6]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[7]  Zongge Liu,et al.  H-Fuse: Efficient Fusion of Aggregated Historical Data , 2017, SDM.

[8]  Christos Faloutsos,et al.  Recovering Information from Summary Data , 1997, VLDB.

[9]  Vladimir Zadorozhny,et al.  A systematic approach to reliability assessment in integrated databases , 2015, Journal of Intelligent Information Systems.

[10]  D. L. Hall,et al.  Mathematical Techniques in Multisensor Data Fusion , 1992 .

[11]  Petre Stoica,et al.  Introduction to spectral analysis , 1997 .

[12]  Peter Steiner,et al.  Temporal Disaggregation of Time Series , 2013, R J..

[13]  M. Nashed Generalized Inverses and Applications: Proceedings of an Advanced Seminar , 1976 .

[14]  Divesh Srivastava,et al.  Big Data Integration , 2015, Synthesis Lectures on Data Management.

[15]  Vladimir Zadorozhny,et al.  Information fusion for USAR operations based on crowdsourcing , 2013, Proceedings of the 16th International Conference on Information Fusion.

[16]  Thierry Blu,et al.  Sampling signals with finite rate of innovation , 2002, IEEE Trans. Signal Process..

[17]  Shawn T. Brown,et al.  Contagious diseases in the United States from 1888 to the present. , 2013, The New England journal of medicine.

[18]  Mongi A. Abidi,et al.  Data fusion: color edge detection and surface reconstruction through regularization , 1996, IEEE Trans. Ind. Electron..

[19]  Ying Zhu,et al.  Reliable Detection of Overtaking Vehicles Using Robust Information Fusion , 2006, IEEE Transactions on Intelligent Transportation Systems.