Quality-Aware Query Based on Relative Source Quality

In many circumstances, such as internet of things or data fusion, a common scenario is that more than one sources provide the data of the same object, but the data quality of the sources are different. Therefore, when querying the sources which may provide low quality data, the query results should include high quality data. In this paper, we define quality-aware query, and build a model to describe the quality-aware query scenario, which aims to get high quality results from multi-sources which may have different data quality scores. Uncertain graph is used to simulate the relative source quality, and a method to compute the quality of the query results is provided.

[1]  Ihab F. Ilyas,et al.  Trends in Cleaning Relational Data: Consistency and Deduplication , 2015, Found. Trends Databases.

[2]  Wenfei Fan,et al.  Determining the relative accuracy of attributes , 2013, SIGMOD '13.

[3]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[4]  Wei Zhang,et al.  Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources , 2015, Proc. VLDB Endow..

[5]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[6]  Christopher Ré,et al.  SLiMFast: Guaranteed Results for Data Fusion and Source Reliability , 2015, SIGMOD Conference.

[7]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[8]  Bart Goethals,et al.  Cleaning Data with Forbidden Itemsets , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[9]  Jianzhong Li,et al.  Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics , 2010, KDD.

[10]  Wenfei Fan,et al.  Foundations of Data Quality Management , 2012, Foundations of Data Quality Management.

[11]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[12]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[13]  Jianzhong Li,et al.  Frequent subgraph pattern mining on uncertain graph data , 2009, CIKM.

[14]  Paolo Papotti,et al.  Holistic data cleaning: Putting violations into context , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).