Relative Accuracy Evaluation

The quality of data plays an important role in business analysis and decision making, and data accuracy is an important aspect in data quality. Thus one necessary task for data quality management is to evaluate the accuracy of the data. And in order to solve the problem that the accuracy of the whole data set is low while a useful part may be high, it is also necessary to evaluate the accuracy of the query results, called relative accuracy. However, as far as we know, neither measure nor effective methods for the accuracy evaluation methods are proposed. Motivated by this, for relative accuracy evaluation, we propose a systematic method. We design a relative accuracy evaluation framework for relational databases based on a new metric to measure the accuracy using statistics. We apply the methods to evaluate the precision and recall of basic queries, which show the result's relative accuracy. We also propose the method to handle data update and to improve accuracy evaluation using functional dependencies. Extensive experimental results show the effectiveness and efficiency of our proposed framework and algorithms.

[1]  Matjaz Perc,et al.  Self-organization of progress across the century of physics , 2013, Scientific Reports.

[2]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[3]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[4]  Christoph Koch,et al.  A compositional framework for complex queries over uncertain data , 2009, ICDT '09.

[5]  Divesh Srivastava,et al.  SOLOMON , 2010, Proc. VLDB Endow..

[6]  Subramanian Arumugam,et al.  Evaluation of probabilistic threshold queries in MCDB , 2010, SIGMOD Conference.

[7]  Charu C. Aggarwal,et al.  On Quantifying the Accuracy of Maximum Likelihood Estimation of Participant Reliability in Social Sensing , 2011 .

[8]  Jacob G. Foster,et al.  Metaknowledge , 2011, Science.

[9]  Zhen Wang,et al.  Impact of Social Punishment on Cooperative Behavior in Complex Networks , 2013, Scientific Reports.

[10]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[11]  Dan Roth,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Making Better Informed Trust Decisions with Generalized Fact-Finding , 2022 .

[12]  S. Kokubo,et al.  Insight into the so-called spatial reciprocity. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[14]  Ashwin Machanavajjhala,et al.  Entity Resolution: Theory, Practice & Open Challenges , 2012, Proc. VLDB Endow..

[15]  Sridhar Ramaswamy,et al.  Join synopses for approximate query answering , 1999, SIGMOD '99.

[16]  Attila Szolnoki,et al.  Interdependent network reciprocity in evolutionary games , 2013, Scientific Reports.

[17]  Andre Seyfarth,et al.  Compliant ankle function results in landing-take off asymmetry in legged locomotion. , 2014, Journal of theoretical biology.

[18]  Viswanath Poosala,et al.  Fast approximate query answering using precomputed statistics , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[19]  Matjaz Perc,et al.  Evolution of the most common English words and phrases over the centuries , 2012, Journal of The Royal Society Interface.

[20]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[21]  Yasmeen F. Al-ward AUTOMATIC DISCOVERY OF CANDIDATE IN THE RELATIONAL DATABASES KEYS BY USING ATTRIBUTES SETS CLOSURE , 2010 .

[22]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[23]  Harry Eugene Stanley,et al.  Languages cool as they expand: Allometric scaling and the decreasing need for new words , 2012, Scientific Reports.

[24]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[25]  Yan Zhang,et al.  Accuracy Evaluation for Sensed Data , 2014, WASA.

[26]  Alon Y. Halevy,et al.  Using Probabilistic Information in Data Integration , 1997, VLDB.

[27]  Jane W.-S. Liu,et al.  Producing approximate answers to set- and single-valued queries , 1994, J. Syst. Softw..

[28]  A. Raman,et al.  Execution: The Missing Link in Retail Operations , 2001 .

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .