Report on the Dagstuhl Seminar

Over the past few years, techniques for managing, querying, and integrating data on the Web have significantly matured. Well-founded and practical approaches to assess or even guarantee a required degree of quality of the data in these frameworks, however, are still missing. This can be contributed to the lack of welldefined data quality metrics and assessment techniques, and the difficulty of handling information about data quality during data integration and query processing. Data quality problems arise in many settings, such as the integration of business data, in Web mining, data dissemination, and in querying the Web using search engines. Data quality (DQ) addresses various forms of data, including structured and semistructured data, text documents, multimedia, and streaming data. Different forms of metadata describing the quality of data is becoming increasingly important since they provide applications and users with information about the value and reliability of (integrated) data on the Web. The Dagstuhl Seminar “Data Quality on the Web”, organized by Michael Gertz, Tamer Ozsu, Gunter Saake, and Kai-Uwe Sattler, took place between August 31st and September 5th 2003 at Schloss Dagstuhl, Germany. The objective of the seminar was to (1) foster collaboration among researchers that deal with DQ in different areas, (2) assess existing results in managing the quality of data, and (3) establish a framework for future research in the area of DQ. The application contexts considered during the seminar included in particular (Web-based) data integration and information retrieval scenarios, scientific databases, and application domains in the computational sciences and Bioinformatics. In all these areas, data quality plays a crucial role and therefore different, tailored solutions have been developed. Sharing and exchanging this knowledge could result in significant synergy effects.