Matching Techniques for Data Integration and Exploration: From Databases to Big Data

In the last two decades, data matching has been addressed for different purposes and in different application contexts, ranging from data integration, to ontology evolution, to semantic data clouding, until more recent exploratory data analysis over large/big datasets. This paper describes the evolution of research activity on matching techniques for data integration and exploration at the ISLab group of the Universita degli Studi di Milano. We analyze the matching techniques according to the structure of target data, the algorithmic pattern of the matching process, and the application focus, and we discuss the results of using our techniques for exploratory analysis of a real dataset composed by all the SEBD proceedings publications in the timeframe 1993–2016.

[1]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[2]  Silvana Castano,et al.  Global Viewing of Heterogeneous Data Sources , 2001, IEEE Trans. Knowl. Data Eng..

[3]  Silvana Castano,et al.  Human-in-the-Loop Web Resource Classification , 2016, OTM Conferences.

[4]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[5]  Silvana Castano,et al.  Exploratory analysis of textual data streams , 2017, Future Gener. Comput. Syst..

[6]  Silvana Castano,et al.  Matching Ontologies in Open Networked Systems: Techniques and Applications , 2006, J. Data Semant..

[7]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[10]  Philip S. Yu,et al.  Under Consideration for Publication in Knowledge and Information Systems on Clustering Massive Text and Categorical Data Streams , 2022 .

[11]  François Scharffe,et al.  Data Linking for the Semantic Web , 2011, Int. J. Semantic Web Inf. Syst..

[12]  Silvana Castano,et al.  Structured data clouding across multiple webs , 2012, Inf. Syst..

[13]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[14]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[15]  Silvana Castano,et al.  Dimensional Clustering of Linked Data: Techniques and Applications , 2015, Trans. Large Scale Data Knowl. Centered Syst..