Ontology-Based Data Quality Management for Data Streams

Data Stream Management Systems (DSMS) provide real-time data processing in an effective way, but there is always a tradeoff between data quality (DQ) and performance. We propose an ontology-based data quality framework for relational DSMS that includes DQ measurement and monitoring in a transparent, modular, and flexible way. We follow a threefold approach that takes the characteristics of relational data stream management for DQ metrics into account. While (1) Query Metrics respect changes in data quality due to query operations, (2) Content Metrics allow the semantic evaluation of data in the streams. Finally, (3) Application Metrics allow easy user-defined computation of data quality values to account for application specifics. Additionally, a quality monitor allows us to observe data quality values and take counteractions to balance data quality and performance. The framework has been designed along a DQ management methodology suited for data streams. It has been evaluated in the domains of transportation systems and health monitoring.

[1]  Pradeep Kumar Ray,et al.  Towards an ontology for data quality in integrated chronic disease management: A realist review of the literature , 2013, Int. J. Medical Informatics.

[2]  Anna Liu,et al.  PODS: a new model and processing algorithms for uncertain data streams , 2010, SIGMOD Conference.

[3]  Matthias Jarke,et al.  Fundamentals of Data Warehouses , 2000, Springer Berlin Heidelberg.

[4]  Richard Margiotta State of the Practice for Traffic Data Quality: White Paper , 2002 .

[5]  Michael M. Wagner,et al.  Review: Accuracy of Data in Computer-based Patient Records , 1997, J. Am. Medical Informatics Assoc..

[6]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[7]  Amol Deshpande,et al.  Ef?cient Query Evaluation over Temporally Correlated Probabilistic Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[8]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[9]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[10]  Victor R. Basili,et al.  The TAME Project: Towards Improvement-Oriented Software Environments , 1988, IEEE Trans. Software Eng..

[11]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[12]  Fakhri Karray,et al.  Multisensor data fusion: A review of the state-of-the-art , 2013, Inf. Fusion.

[13]  S. Turner DEFINING AND MEASURING TRAFFIC DATA QUALITY , 2002 .

[14]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[15]  Wolfgang Lehner,et al.  Representing Data Quality in Sensor Data Streaming Environments , 2009, JDIQ.

[16]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[17]  Sandra Geisler,et al.  Data Stream Management Systems , 2013, Data Exchange, Information, and Streams.

[18]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[19]  María Bermúdez-Edo,et al.  A Knowledge-Based Approach for Real-Time IoT Data Stream Annotation and Processing , 2014, 2014 IEEE International Conference on Internet of Things(iThings), and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom).

[20]  J Bowe,et al.  Total quality management. , 1992, Contemporary longterm care.

[21]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[22]  Roger G. Schroeder,et al.  Six Sigma: Definition and underlying theory , 2008 .

[23]  Quang Hieu Vu,et al.  A Quality-Centric Data Model for Distributed Stream Management Systems , 2009 .

[24]  Qi Han,et al.  Journal of Network and Systems Management ( c ○ 2007) DOI: 10.1007/s10922-007-9062-0 A Survey of Fault Management in Wireless Sensor Networks , 2022 .

[25]  K. Thiru,et al.  Systematic review of scope and quality of electronic patient record data in primary care , 2003, BMJ : British Medical Journal.

[26]  Neil Hoose,et al.  Highway Traffic Monitoring and Data Quality , 2008 .

[27]  Sandra Geisler,et al.  Ontology-based data quality framework for data stream applications , 2011, ICIQ.

[28]  JÜRGEN KRÄMER,et al.  Semantics and implementation of continuous sliding window queries over data streams , 2009, TODS.

[29]  Werner Retschitzegger,et al.  Improving Situation Awareness In Traffic Management , 2010 .

[30]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications) , 2006 .

[31]  Prasoon Goyal,et al.  Probabilistic Databases , 2009, Encyclopedia of Database Systems.

[32]  Rainer Schutzle Quality Management in ROSATTE , 2009 .

[33]  Edward G. Schilling,et al.  Juran's Quality Handbook , 1998 .

[34]  Matthias Jarke,et al.  An evaluation framework for traffic information systems based on data streams , 2012 .

[35]  Thomas Redman Data Quality Management Past, Present, and Future: Towards a Management System for Data , 2013, Handbook of Data Quality.

[36]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[37]  Wenfei Fan,et al.  Foundations of Data Quality Management , 2012, Foundations of Data Quality Management.

[38]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[39]  Andreas Bitterer Magic Quadrant for Data Quality Tools , 2011 .

[40]  Sven Schmidt,et al.  Quality of service aware data stream processing , 2007 .

[41]  Tony R. Sahama,et al.  Investigation of decision making issues in the use of current clinical information systems , 2012, HIC.

[42]  Matthias Jarke,et al.  HealthNet: A System for Mobile and Wearable Health Information Management , 2013, IMMoA.

[43]  Sandra Geisler,et al.  Accuracy Assessment for Traffic Information Derived from Floating Phone Data , 2010 .

[44]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[45]  Jane Taggart,et al.  Corrigendum to "Towards an ontology for data quality in integrated chronic disease management: A realist review of the literature" [Int. J. Med. Inform. 82 (2013) 10-24] , 2013, Int. J. Medical Informatics.

[46]  Thomas Plagemann,et al.  Adaptive sized windows to improve real-time health monitoring: a case study on heart attack prediction , 2010, MIR '10.

[47]  Thomas Redman,et al.  Data quality for the information age , 1996 .

[48]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[49]  John L. Campbell,et al.  Quantity is Nothing without Quality: Automated QA/QC for Streaming Environmental Sensor Data , 2013 .

[50]  Karl Aberer,et al.  A middleware for fast and flexible sensor network deployment , 2006, VLDB.

[51]  Alun D. Preece,et al.  An ontology‐based approach to handling information quality in e‐Science , 2008, Concurr. Comput. Pract. Exp..

[52]  Matthias Jarke,et al.  Architecture and Quality in Data Warehouses: An Extended Repository Approach , 1999, Information Systems.

[53]  Hermann Kopetz,et al.  Real-time systems , 2018, CSC '73.

[54]  Wolfgang Lehner,et al.  QStream: Deterministic Querying of Data Streams , 2004, VLDB.

[55]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[56]  Stefan Brueggemann,et al.  Using Domain Knowledge Provided by Ontologies for Improving Data Quality Management , 2008 .