Information integration over time in unreliable and uncertain environments

Often an interesting true value such as a stock price, sports score, or current temperature is only available via the observations of noisy and potentially conflicting sources. Several techniques have been proposed to reconcile these conflicts by computing a weighted consensus based on source reliabilities, but these techniques focus on static values. When the real-world entity evolves over time, the noisy sources can delay, or even miss, reporting some of the real-world updates. This temporal aspect introduces two key challenges for consensus-based approaches: (i) due to delays, the mapping between a source's noisy observation and the real-world update it observes is unknown, and (ii) missed updates may translate to missing values for the consensus problem, even if the mapping is known. To overcome these challenges, we propose a formal approach that models the history of updates of the real-world entity as a hidden semi-Markovian process (HSMM). The noisy sources are modeled as observations of the hidden state, but the mapping between a hidden state (i.e. real-world update) and the observation (i.e. source value) is unknown. We propose algorithms based on Gibbs Sampling and EM to jointly infer both the history of real-world updates as well as the unknown mapping between them and the source values. We demonstrate using experiments on real-world datasets how our history-based techniques improve upon history-agnostic consensus-based approaches.

[1]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[2]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[3]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[4]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[5]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[7]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[8]  Dayne Freitag,et al.  Multistrategy Learning for Information Extraction , 1998, ICML.

[9]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[10]  Shunzheng Yu,et al.  A hidden semi-Markov model with missing data and multiple observation sequences for mobility tracking , 2003, Signal Process..

[11]  Donggang Liu,et al.  Attack-Resistant Location Estimation in Wireless Sensor Networks , 2008, TSEC.

[12]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[13]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[14]  Shunzheng Yu,et al.  Hidden semi-Markov models , 2010, Artif. Intell..

[15]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[16]  Divesh Srivastava,et al.  Linking temporal records , 2011, Frontiers of Computer Science.

[17]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .