Keys as Features for Graph Entity Matching

Keys for graphs aim to uniquely identify entities represented by vertices in a graph, using the combination of topological constraints and value equality constraints. This paper proposes graph matching keys, referred to as GMKs, an extension of graph keys with similarity predicates on values, supporting approximation entity matching. We treat entity matching as a classification problem, and propose GMKSLEM, a supervised learning method for graph entity matching. In GMKSLEM, a feature extraction method is provided to discover candidate GMKs (CGMKs) to construct features for vector representation, and then high-quality features and representations are generated by feature selection. Moreover, GMKSLEM provides support to explain the classification results. Using real-life data, we experimentally verify the effectiveness of GMKSLEM, as well as its interpretability.

[1]  Mourad Ouzzani,et al.  Distributed representations of tuples for entity resolution , 2018, VLDB 2018.

[2]  Jose Miguel Puerta,et al.  Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking , 2012, Knowl. Based Syst..

[3]  Philip S. Yu,et al.  COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency , 2015, KDD.

[4]  Chengfei Liu,et al.  Discover Dependencies from Data—A Review , 2012, IEEE Transactions on Knowledge and Data Engineering.

[5]  AnHai Doan,et al.  Technical Perspective:: Toward Building Entity Matching Management Systems , 2016, SGMD.

[6]  Ping Lu,et al.  Dependencies for Graphs , 2019, ACM Trans. Database Syst..

[7]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[8]  Chao Tian,et al.  Keys for Graphs , 2015, Proc. VLDB Endow..

[9]  Ping Lu,et al.  Edinburgh Research Explorer Discovering Graph Functional Dependencies , 2022 .

[10]  Ping Lu,et al.  Deducing Certain Fixes to Graphs , 2019, Proc. VLDB Endow..

[11]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[12]  Theodoros Rekatsinas,et al.  Deep Learning for Entity Matching: A Design Space Exploration , 2018, SIGMOD Conference.

[13]  Markus Stumptner,et al.  Certus: An Effective Entity Resolution Approach with Graph Differential Dependencies (GDDs) , 2019, Proc. VLDB Endow..