Managerial decision support with knowledge of accuracy and completeness of the relational aggregate functions

Aggregate data produced by decision support systems is utilized by managers in their decision making process to run or improve their firm's operations. Often, data residing in corporate databases and data warehouses are far from being perfect, and their imperfections have an impact on decision quality and outcome. Therefore, having knowledge about the effect of data errors on aggregate data could lead to more informed decisions, reduced risks, and competitive advantage. In this paper, we present a methodology to estimate the effects of data accuracy and completeness, as two important data quality dimensions, on the relational aggregate functions Count, Sum, Average, Max, and Min. Our methodology defines a set of attribute value types and deploys sampling strategies to determine the maximum likelihood estimates of each value type. We show the effect of data error rates on the scalar values returned by the aggregate functions and demonstrate the efficiency of our estimates by Monte Carlo simulations.

[1]  Daryl J. D'Souza,et al.  The Cost of Relational Algebraic Operations on Skewed Data: Estimates and Experiments , 1983, IFIP Congress.

[2]  Georg Gottlob,et al.  Closed World Databases Opened Through Null Values , 1988, VLDB.

[3]  Martin J. Eppler Managing Information Quality , 2003 .

[4]  Johann Eder,et al.  Logic and Databases , 1992, Advanced Topics in Artificial Intelligence.

[5]  Amihai Motro,et al.  Completeness Information and Its Application to Query Processing , 1986, VLDB.

[6]  Wen-Chi Hou,et al.  Statistical estimators for aggregate relational algebra queries , 1991, TODS.

[7]  Richard Y. Wang,et al.  Modeling Information Manufacturing Systems to Determine Information Product Quality Management Scien , 1998 .

[8]  Doron Rotem,et al.  Simple Random Sampling from Relational Databases , 1986, VLDB.

[9]  Varghese S. Jacob,et al.  Assessing data quality for information products , 1999, ICIS.

[10]  Richard Y. Wang,et al.  Toward quality data: An attribute-based approach , 2014, Decis. Support Syst..

[11]  Amihai Motro,et al.  Estimating the Quality of Databases , 1998, FQAS.

[12]  Frank Proschan,et al.  Schur convexity of the maximum likelihood function for the multivariate hypergeometric and multinomial distributions , 1987 .

[13]  E. F. Codd,et al.  Missing information (applicable and inapplicable) in relational databases , 1986, SGMD.

[14]  Stuart E. Madnick,et al.  Good answers from bad data : a data management strategy , 1995 .

[15]  Gultekin Özsoyoglu,et al.  A Family of Incomplete Relational Database Models , 1989, VLDB.

[16]  G. Shankaranarayan,et al.  Managing Data Quality in Dynamic Decision Environments: An Information Product Approach , 2003, J. Database Manag..

[17]  Gultekin Özsoyoglu,et al.  Extending relational algebra and relational calculus with set-valued attributes and aggregate functions , 1987, TODS.

[18]  Henri Prade,et al.  Generalizing Database Relational Algebra for the Treatment of Incomplete/Uncertain Information and Vague Queries , 1984, Inf. Sci..

[19]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[20]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[21]  Richard Y. Wang,et al.  Estimating Data Accuracy in a Federated Database Environment , 1995, CISMOD.

[22]  Richard C. Morey,et al.  Estimating and improving the quality of information in a MIS , 1982, CACM.

[23]  Arbee L. P. Chen,et al.  Evaluating Aggregate Operations Over Imprecise Data , 1996, IEEE Trans. Knowl. Data Eng..

[24]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[25]  Varghese S. Jacob,et al.  Assessing Data Quality for Information Products: Impact of Selection, Projection, and Cartesian Product , 2004, Manag. Sci..

[26]  Amihai Motro,et al.  Integrity = validity + completeness , 1989, TODS.

[27]  Douglas C. Montgomery,et al.  Introduction to Statistical Quality Control , 1986 .

[28]  Gordon B. Davis,et al.  Can Humans Detect Errors in Data? Impact of Base Rates, Incentives, and Goals , 1997, MIS Q..

[29]  InduShobha N. Chengalur-Smith,et al.  The Impact of Data Quality Information on Decision Making: An Exploratory Analysis , 1999, IEEE Trans. Knowl. Data Eng..