DATA INVESTIGATION: ISSUES OF DATA QUALITY AND IMPLEMENTING BASE ANALYSIS TECHNIQUE TO EVALUATE QUALITY OF DATA IN HETEROGENEOUS DATABASES

Data investigation is a process to understand the nature of data in heterogeneous databases. Many organizations are using online transactions systems to support their company operations. The diversity of applications system that used to support organization may lead to data anomalies without the system owners realized the negative impact of decision making from insufficient information of data. The quality of the results from any analysis is only as good as the quality of the inputs (the data) that feed that analysis. Therefore, data quality process is still a major factor in the successful operation of IT. An introducing of new tech systems such as grid systems, ETL applications, semantic web are meaningless if data are lack of quality. In avoiding “Garbage In Garbage Out” principle, we proposed a technique that help to understand a natured of data which we refer as Base Analysis Technique (BAT). BAT is used to profile heterogeneous data in a structured approach, with the intention to determine abnormal data. The technique contains three levels of analysis consists of Top Level Analysis, Middle Level Analysis and Low Level Analysis. On the other hand, Data Quality Analysis System (DQAS) is a tool that developed using open source technologies which is connected to commercial databases in supporting BAT to be implemented in three-tier architecture. This paper describes issues surrounding data quality area and how BAT evaluates the quality of data in heterogeneous databases.

[1]  Donald P. Ballou,et al.  Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems , 1985 .

[2]  Stuart E. Madnick,et al.  The inter-database instance identification problem in integrating autonomous systems , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[3]  M. D. Wilkinson,et al.  Management science , 1989, British Dental Journal.

[4]  Thomas Redman,et al.  Data quality for the information age , 1996 .

[5]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[6]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[7]  Matthias Jarke,et al.  Design and Analysis of Quality Information for Data Warehouses , 1998, ER.

[8]  Giri Kumar Tayi,et al.  Enhancing data quality in data warehouse environments , 1999, CACM.

[9]  Stuart E. Madnick,et al.  Total Data Quality Management (TDQM) Research Program , 2001 .

[10]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[11]  R. P. Srivastava,et al.  A conceptual framework and belief‐function approach to assessing overall information quality , 2003, Int. J. Intell. Syst..

[12]  Antonino Virgillito Carlo Marchetti,et al.  The DaQuinCIS Architecture : a Platform for Exchanging and Improving Data Quality in Cooperative Information Systems ? , 2003 .

[13]  Verónika Peralta,et al.  A framework for analysis of data freshness , 2004, IQIS '04.

[14]  Gustav Mikkelsen,et al.  Consequences of impaired data quality on information retrieval in electronic patient records , 2005, Int. J. Medical Informatics.

[15]  Kalina Yacef,et al.  Educational Data Mining: a Case Study , 2005, AIED.

[16]  M. R. Osman,et al.  ERP Systems Implementation in Malaysia: The Importance of Critical Success Factors , 2006 .

[17]  Stuart E. Madnick,et al.  Improving data quality through effective use of data semantics , 2006, Data Knowl. Eng..

[18]  Qiuming Zhu,et al.  A trend pattern assessment approach to microarray gene expression profiling data analysis , 2007, Pattern Recognit. Lett..

[19]  Robert Jeansoulin,et al.  Towards spatial data quality information analysis tools for experts assessing the fitness for use of spatial data , 2007, Int. J. Geogr. Inf. Sci..

[20]  Mo Lin,et al.  A Method for Measuring Data Quality in Data Integration , 2008, 2008 International Seminar on Future Information Technology and Management Engineering.

[21]  Huajun Chen,et al.  Data Quality in Traditional Chinese Medicine , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[22]  Stuart E. Madnick,et al.  Overview and Framework for Data and Information Quality Research , 2009, JDIQ.

[23]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[24]  Beizhan Wang,et al.  Analysis and solution of data quality in data warehouse of Chinese materia medica , 2009, 2009 4th International Conference on Computer Science & Education.

[25]  Isabelle Comyn-Wattiau,et al.  Data quality through model quality: a quality model for measuring and improving the understandability of conceptual models , 2009, CIKM 2009.

[26]  C. G. Jacobs Challenges to the quality of data-quality measures , 2009 .

[27]  Matteo Magnani,et al.  A Case Study on the Analysis of the Data Quality of a Large Medical Database , 2009, 2009 20th International Workshop on Database and Expert Systems Application.

[28]  Daniel G Shimshak,et al.  Incorporating Quality into Data Envelopment Analysis of Nursing Home Performance: A Case Study. , 2009, Omega.

[29]  Huang Yu,et al.  A universal data cleaning framework based on user model , 2009, 2009 ISECS International Colloquium on Computing, Communication, Control, and Management.

[30]  Mohd Yazid Saman,et al.  DQAS implementation to support Base Analysis techniques in Data Quality Life Cycle , 2011, 2011 11th International Conference on Hybrid Intelligent Systems (HIS).