Computational Fact Checking through Query Perturbations

Our media is saturated with claims of “facts” made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, for example, is a claim “cherry-picking”? This article proposes a framework that models claims based on structured data as parameterized queries. Intuitively, with its choice of the parameter setting, a claim presents a particular (and potentially biased) view of the underlying data. A key insight is that we can learn a lot about a claim by “perturbing” its parameters and seeing how its conclusion changes. For example, a claim is not robust if small perturbations to its parameters can change its conclusions significantly. This framework allows us to formulate practical fact-checking tasks—reverse-engineering vague claims, and countering questionable claims—as computational problems. Along with the modeling framework, we develop an algorithmic framework that enables efficient instantiations of “meta” algorithms by supplying appropriate algorithmic building blocks. We present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.

[1]  Davide Martinenghi,et al.  Ranking with uncertain scoring functions: semantics and sensitivity measures , 2011, SIGMOD '11.

[2]  Peter J. Haas,et al.  The monte carlo database system: Stochastic analysis close to the data , 2011, TODS.

[3]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[4]  Christian Buchta,et al.  On the Average Number of Maxima in a Set of Vectors , 1989, Inf. Process. Lett..

[5]  H. Lee,et al.  A data abstraction approach for query relaxation , 2000, Inf. Softw. Technol..

[6]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[7]  Samuel Madden,et al.  Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..

[8]  Cong Yu,et al.  Computational Journalism: A Call to Arms to Database Researchers , 2011, CIDR.

[9]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[10]  Quoc Trung Tran,et al.  How to ConQueR why-not questions , 2010, SIGMOD Conference.

[11]  Alon Y. Halevy,et al.  Principles of Data Integration , 2012 .

[12]  Jennifer Widom,et al.  Synthesizing view definitions from data , 2010, ICDT '10.

[13]  Jayant R. Haritsa,et al.  Identifying robust plans through plan diagram reduction , 2008, Proc. VLDB Endow..

[14]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[15]  Jia-Ling Koh,et al.  The Strategies for Supporting Query Specialization and Query Generalization in Social Tagging Systems , 2013, DASFAA Workshops.

[16]  Subbarao Kambhampati,et al.  SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement , 2010, WWW '10.

[17]  Hassan Masum,et al.  Review of Computational Geometry: Algorithms and Applications (2nd ed.) by Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Schwarzkopf , 2000, SIGA.

[18]  Pankaj K. Agarwal,et al.  Processing a large number of continuous preference top-k queries , 2012, SIGMOD Conference.

[19]  Kurt Mehlhorn,et al.  Dynamic fractional cascading , 1990, Algorithmica.

[20]  Carolina Ruiz,et al.  PARAS: A Parameter Space Framework for Online Association Mining , 2013, Proc. VLDB Endow..

[21]  Bernard Chazelle,et al.  A Functional Approach to Data Structures and Its Use in Multidimensional Searching , 1988, SIAM J. Comput..

[22]  Henry A. Kautz,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[23]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[24]  Adam Jatowt,et al.  Supporting Judgment of Fact Trustworthiness Considering Temporal and Sentimental Aspects , 2008, WISE.

[25]  Pankaj K. Agarwal,et al.  On "one of the few" objects , 2012, KDD.

[26]  Eric Lo,et al.  Answering Why-Not Questions on Top-K Queries , 2012, IEEE Transactions on Knowledge and Data Engineering.

[27]  Christian S. Jensen,et al.  Temporal Specialization and Generalization , 1994, IEEE Trans. Knowl. Data Eng..

[28]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[29]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[30]  Clement T. Yu,et al.  T-verifier: Verifying truthfulness of fact statements , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[31]  Sumit Ganguly,et al.  Design and Analysis of Parametric Query Optimization Algorithms , 1998, VLDB.

[32]  Jun Yang,et al.  Perturbation Analysis of Database Queries , 2016, Proc. VLDB Endow..

[33]  Christopher Ré,et al.  Probabilistic databases: diamonds in the dirt , 2009, CACM.

[34]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[35]  Robert E. Tarjan,et al.  Applications of Path Compression on Balanced Trees , 1979, JACM.

[36]  H. V. Jagadish,et al.  Constructing a Generic Natural Language Interface for an XML Database , 2006, EDBT.

[37]  H. V. Jagadish,et al.  DaNaLIX: a domain-adaptive natural language interface for querying XML , 2007, SIGMOD '07.

[38]  Timos K. Sellis,et al.  Parametric query optimization , 1992, The VLDB Journal.

[39]  Katsumi Tanaka,et al.  Finding Comparative Facts and Aspects for Judging the Credibility of Uncertain Facts , 2009, WISE.

[40]  Srinivasan Parthasarathy,et al.  Query by output , 2009, SIGMOD Conference.

[41]  Sarah Cohen,et al.  Computational journalism , 2011, Commun. ACM.

[42]  Kyriakos Mouratidis,et al.  Computing Immutable Regions for Subspace Top-k Queries , 2012, Proc. VLDB Endow..

[43]  Pankaj K. Agarwal,et al.  iCheck: computationally combating "lies, d--ned lies, and statistics" , 2014, SIGMOD Conference.

[44]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[45]  Qiming Chen,et al.  Cooperative Query Answering via Type Abstraction Hierarchy , 1991 .

[46]  Dan Suciu,et al.  A formal approach to finding explanations for database queries , 2014, SIGMOD Conference.

[47]  Laura M. Haas,et al.  Information integration in the enterprise , 2008, CACM.

[48]  Surajit Chaudhuri Generalization and a framework for query modification , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[49]  S. Sudarshan,et al.  AniPQO: Almost Non-intrusive Parametric Query Optimization for Nonlinear Cost Functions , 2003, VLDB.