Constrained Information-Theoretic Tripartite Graph Clustering to Identify Semantically Similar Relations

In knowledge bases or information extraction results, differently expressed relations can be semantically similar (e.g., (X, wrote, Y) and (X, 's written work, Y)). Therefore, grouping semantically similar relations into clusters would facilitate and improve many applications, including knowledge base completion, information extraction, information retrieval, and more. This paper formulates relation clustering as a constrained tripartite graph clustering problem, presents an efficient clustering algorithm and exhibits the advantage of the constrained framework. We introduce several ways that provide side information via must-link and cannot-link constraints to improve the clustering results. Different from traditional semi-supervised learning approaches, we propose to use the similarity of relation expressions and the knowledge of entity types to automatically construct the constraints for the algorithm. We show improved relation clustering results on two datasets extracted from human annotated knowledge base (i.e., Freebase) and open information extraction results (i.e., ReVerb data).

[1]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[2]  Danushka Bollegala,et al.  Measuring the similarity between implicit semantic relations from the web , 2009, WWW '09.

[3]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[4]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[5]  Dan Roth,et al.  Exploiting Background Knowledge for Relation Extraction , 2010, COLING.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Pedro M. Domingos,et al.  Extracting Semantic Networks from Text Via Relational Clustering , 2008, ECML/PKDD.

[8]  Philip S. Yu,et al.  Efficient Semi-supervised Spectral Co-clustering with Constraints , 2010, 2010 IEEE International Conference on Data Mining.

[9]  Heng Ji,et al.  Incremental Joint Extraction of Entity Mentions and Relations , 2014, ACL.

[10]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[11]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[12]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[13]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[14]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[15]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[16]  Zhengdong Lu,et al.  Penalized Probabilistic Clustering , 2007, Neural Computation.

[17]  Andrew McCallum,et al.  Structured Relation Discovery using Generative Models , 2011, EMNLP.

[18]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[19]  Rahul Gupta,et al.  Knowledge base completion via search-based question answering , 2014, WWW.

[20]  Ming Zhou,et al.  Paraphrasing Adaptation for Web Search Ranking , 2013, ACL.

[21]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[22]  Ming-Wei Chang,et al.  Structured learning with constrained conditional models , 2012, Machine Learning.

[23]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[24]  Dan Roth,et al.  Exploiting Syntactico-Semantic Structures for Relation Extraction , 2011, ACL.

[25]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[26]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[27]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[28]  Furu Wei,et al.  Constrained Text Coclustering with Supervised and Unsupervised Constraints , 2013, IEEE Transactions on Knowledge and Data Engineering.

[29]  Sameer Singh,et al.  Injecting Logical Background Knowledge into Embeddings for Relation Extraction , 2015, NAACL.

[30]  Danushka Bollegala,et al.  Relational duality: unsupervised extraction of semantic relations between entities on the web , 2010, WWW '10.

[31]  Xiang Li,et al.  Joint inference for cross-document information extraction , 2011, CIKM '11.

[32]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[33]  Pedro M. Domingos,et al.  Statistical predicate invention , 2007, ICML '07.

[34]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[35]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[36]  Kai-Wei Chang,et al.  Typed Tensor Decomposition of Knowledge Bases for Relation Extraction , 2014, EMNLP.

[37]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[38]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[39]  Fabian M. Suchanek,et al.  Canonicalizing Open Knowledge Bases , 2014, CIKM.

[40]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..