Data Cleansing Decisions: Insights from Discrete-Event Simulations of Firm Resources and Data Quality

The cost of poor data quality has been measured in the billions of dollars annually. However, deriving coherent data cleansing strategies to improve data quality is challenging because it is often difficult to justify the financial and human capital cost involved in cleaning data. But those who have planned and designed an effective approach to cleaning data report significant benefits. Although extant literature has extensively focused on data quality issues, little attention has been directed toward providing decision-making techniques that help practitioners determine the cost and benefits of adopting data-cleansing approaches. This study advances an approach that illustrates how discrete-event simulation can be used as a decision tool for making data-cleansing decisions, by understanding the interactions among the firms' resources and performance outcomes. To our knowledge, this is one of the first studies to apply discrete-event simulation for evaluating data-cleansing approaches. The article contributes to an understanding of how various organizational resources interact within, and between, two data-cleansing approaches to drive performance outcomes. Simulation approaches such as the one examined here reveal how the complexity of interactions among such factors can produce results that are difficult to anticipate using other approaches.

[1]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[2]  Gerald C. Kane,et al.  Information Technology and Organizational Learning: An Investigation of Exploration and Exploitation Processes , 2007, Organ. Sci..

[3]  Craig W. Fisher,et al.  Criticality of data quality as exemplified in two disasters , 2001, Inf. Manag..

[4]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[5]  Diane M. Strong,et al.  Information quality benchmarks: product and service performance , 2002, CACM.

[6]  Adir Even,et al.  Evaluating a model for cost-effective data quality management in a real-world CRM setting , 2010, Decis. Support Syst..

[7]  J. D. Johannes,et al.  Systems Simulation: The Art and Science , 1975, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  John P. Slone INFORMATION QUALITY STRATEGY: AN EMPIRICAL INVESTIGATION OF THE RELATIONSHIP BETWEEN INFORMATION QUALITY IMPROVEMENTS AND ORGANIZATIONAL OUTCOMES , 2006 .

[9]  Klaus G. Troitzsch,et al.  Modelling and simulation in the social sciences from the philosophy of science point of view , 1996 .

[10]  T. E. Hull,et al.  Random Number Generators , 1962 .

[11]  Wonseok Oh,et al.  A Network Perspective of Digital Competition in Online Advertising Industries: A Simulation-Based Approach , 2010, Inf. Syst. Res..

[12]  Stephan Hartmann,et al.  The World as a Process , 1996 .

[13]  Tom L. Roberts,et al.  Information quality: a conceptual framework and empirical validation , 2004 .

[14]  Barbara Wixom,et al.  Antecedents of Information and System Quality: An Empirical Examination Within the Context of Data Warehousing , 2005, J. Manag. Inf. Syst..

[15]  Kawaljeet Singh,et al.  A Descriptive Classification of Causes of Data Quality Problems in Data Warehousing , 2010 .

[16]  Barbara Wixom,et al.  The Current State of Business Intelligence , 2007, Computer.

[17]  Martin Greenberger,et al.  Random number generators , 1959, ACM National Meeting.

[18]  George S. Fishman,et al.  Discrete-Event Simulation : Modeling, Programming, and Analysis , 2001 .

[19]  Thomas C. Redman,et al.  Measuring Data Accuracy: A Framework and Review , 2014 .

[20]  InduShobha N. Chengalur-Smith,et al.  The Impact of Experience and Time on the Use of Data Quality Information in Decision Making , 2003, Inf. Syst. Res..

[21]  Giri Kumar Tayi,et al.  Enhancing data quality in data warehouse environments , 1999, CACM.

[22]  Richard E. Nance,et al.  Perspectives on the Evolution of Simulation , 2002, Oper. Res..

[23]  Veda C. Storey,et al.  A Framework for Analysis of Data Quality Research , 1995, IEEE Trans. Knowl. Data Eng..

[24]  Stuart E. Madnick,et al.  Overview and Framework for Data and Information Quality Research , 2009, JDIQ.

[25]  Robert E. Shannon,et al.  Introduction to the art and science of simulation , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[26]  Stephan Hartmann,et al.  The World as a Process: Simulations in the Natural and Social Sciences , 1996 .

[27]  Srinivasan Raghunathan,et al.  Impact of information quality and decision-maker quality on decision quality: a theoretical model and simulation analysis , 1999, Decis. Support Syst..

[28]  Vlatka Hlupic,et al.  Business process modelling and analysis using discrete-event simulation , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[29]  Jeanne G. Harris,et al.  Competing on Analytics: The New Science of Winning , 2007 .

[30]  Anany Levitin,et al.  Data as a Resource: Properties, Implications, and Prescriptions , 1998 .

[31]  Danielle M. Varda,et al.  A Network Perspective , 2009 .

[32]  Averill M. Law,et al.  Simulation Modeling and Analysis , 1982 .

[33]  Geoffrey Gordon,et al.  System Simulation , 1970 .

[34]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[35]  Adir Even,et al.  Utility Cost Perspectives in Data Quality Management , 2009, J. Comput. Inf. Syst..

[36]  B. K. Ghosh,et al.  Simulation Using Promodel , 2000 .

[37]  Irit Askira Gelman,et al.  Setting priorities for data accuracy improvements in satisficing decision-making scenarios: A guiding theory , 2010, Decis. Support Syst..

[38]  Richard Y. Wang,et al.  Modeling Information Manufacturing Systems to Determine Information Product Quality Management Scien , 1998 .

[39]  Scott P. Baird,et al.  Simulation modeling using ProModel for Windows , 1994, Proceedings of Winter Simulation Conference.