From Data Quality to Big Data Quality: A Data Integration Scenario

Big data has made its appearance in many fields, including scientific research, business, public administration and so on. Although, it is acknowledged that there exist different aspects (e.g., acquisition of data, extraction, pre-processing, analysis modelling and functionality, interpretation, etc.) that might affect the benefit of such data, several authors identify data quality as the most decisive one. More recently, a variety of data types have arisen from linguistic and visual information, used and diffused through social networks, Internet of things, enterprise and public sector information systems as well as the Web. The big data phenomenon has deeply impacted on the diversity of types of data. In our previous work, we provided a deep investigation on how data quality concepts can be extended to such vast set of data types, encompassing, e.g., semi-structured texts, maps, images and linked data. In this work, we focus on Linked Data, a type of data that can be viewed as big data and study the effect of data quality in a data integration scenario.

[1]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[2]  David Loshin,et al.  Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph , 2013 .

[3]  Martin Hepp,et al.  Swiqa - a semantic web information quality assessment framework , 2011, ECIS.

[4]  Jürgen Umbrich,et al.  An empirical survey of Linked Data conformance , 2012, J. Web Semant..

[5]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[6]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[7]  Michael Uschold Ontology and database schema: What's the difference? , 2015, Appl. Ontology.

[8]  Carlo Batini,et al.  From Data Quality to Big Data Quality , 2015, J. Database Manag..

[9]  Juha Heinanen,et al.  OF DATA INTENSIVE APPLICATIONS , 1986 .

[10]  Asunción Gómez-Pérez,et al.  Predicting incorrect mappings: a data-driven approach applied to DBpedia , 2018, SAC.

[11]  Nikolas Mitrou,et al.  Bringing relational databases into the Semantic Web: A survey , 2012, Semantic Web.

[12]  Rik Van de Walle,et al.  Assessing and Refining Mappings to RDF to Improve Dataset Quality , 2015, SEMWEB.

[13]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[14]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[15]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[16]  Krzysztof Janowicz,et al.  Linked Data, Big Data, and the 4th Paradigm , 2013, Semantic Web.

[17]  Juan Sequeda,et al.  Integrating Relational Databases with the Semantic Web: A Reflection , 2017, Reasoning Web.

[18]  Jens Lehmann,et al.  Test-driven evaluation of linked data quality , 2014, WWW.

[19]  G. Currie,et al.  What's the difference. , 1959, Canadian hospital.

[20]  Jürgen Umbrich,et al.  Observing Linked Data Dynamics , 2013, ESWC.

[21]  Richard Y. Wang,et al.  Data Quality Assessment , 2002 .

[22]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[23]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[24]  Christoph Lange,et al.  Evaluating the quality of the LOD cloud: An empirical investigation , 2018, Semantic Web.

[25]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..