Relationship Matching of Data Sources: A Graph-Based Approach

Relationship matching is a key procedure during the process of transforming structural data sources, like relational data bases, spreadsheets into the common data model. The matching task refers to the automatic identification of correspondences between relationships of source columns and the relationships of the common data model. Numerous techniques have been developed for this purpose. However, the work is missing to recognize relationship types between entities in information obtained from data sources in instance level and resolve ambiguities. In this paper, we develop a method for resolving ambiguous relationship types between entity instances in structured data. The proposed method can be used as standalone matching techniques or to complement existing relationship matching techniques of data sources. The result of an evaluation on a large real-world data set demonstrated the high accuracy of our approach (>80%).

[1]  Kristina Lerman,et al.  Semi-automatically Mapping Structured Sources into the Semantic Web , 2012, ESWC.

[2]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[3]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[4]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  Simone Paolo Ponzetto,et al.  Joining Forces Pays Off: Multilingual Joint Word Sense Disambiguation , 2012, EMNLP.

[6]  Ansgar Scherp,et al.  TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from the Linked Open Data Cloud , 2015, ESWC.

[7]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[8]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[9]  Craig A. Knoblock,et al.  Leveraging Linked Data to Discover Semantic Relations Within Data Sources , 2016, SEMWEB.

[10]  Li Qian,et al.  Sample-driven schema mapping , 2012, SIGMOD Conference.

[11]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[12]  Craig A. Knoblock,et al.  Learning the Semantics of Structured Data Sources , 2016, J. Web Semant..

[13]  Haibin Liu,et al.  EXPLORING A SUBGRAPH MATCHING APPROACH FOR EXTRACTING BIOLOGICAL EVENTS FROM LITERATURE , 2014, Comput. Intell..

[14]  Craig A. Knoblock,et al.  Semantic Labeling: A Domain-Independent Approach , 2016, SEMWEB.

[15]  Craig A. Knoblock,et al.  Assigning Semantic Labels to Data Sources , 2015, ESWC.

[16]  Craig A. Knoblock,et al.  Lessons Learned in Building Linked Data for the American Art Collaborative , 2017, SEMWEB.

[17]  Jan Mendling,et al.  Automatic Detection and Resolution of Lexical Ambiguity in Process Models , 2015, IEEE Transactions on Software Engineering.

[18]  Ragnhild Van Der Straeten,et al.  Detecting and resolving model inconsistencies using transformation dependency analysis , 2006, MoDELS'06.