Combining Syntactic and Sequential Patterns for Unsupervised Semantic Relation Extraction.

This work investigates the impact of syntactic features in a completely unsupervised semantic relation extraction experiment. Automated relation extraction deals with identifying semantic relation instances in a text and classifying them according to the type of relation. This task is essential in information and knowledge extraction and in knowledge base population. Supervised relation extraction systems rely on annotated examples [ , – , ] and extract di erent kinds of features from the training data, and eventually from external knowledge sources. The types of extracted relations are necessarily limited to a pre-defined list. In Open Information Extraction (OpenIE) [ , ] relation types are inferred directly from the data: concept pairs representing the same relation are grouped together and relation labels can be generated from context segments or through labeling by domain experts [ , , ]. A commonly used method [ , ] is to represent entity couples by a pair-pattern matrix, and cluster relation instances according to the similarity of their distribution over patterns. Pattern-based approaches [ , , , , ] typically use lexical context patterns, assuming that the semantic relation between two entities is explicitly mentioned in the text. Patterns can be defined manually [ ], obtained by Latent Relational Analysis [ ], or from a corpus by sequential pattern mining [ , , ]. Previous works, especially in the biomedical domain, have shown that not only lexical patterns, but also syntactic dependency trees can be beneficial in supervised and semi-supervised relation extraction [ , , – ]. Early experiments on combining lexical patterns with di erent types of distributional information in unsupervised relation clustering did not bring significant improvement [ ]. The underlying di culty is that while supervised classifiers can learn to weight attributes from di erent sources, it is not trivial to combine di erent types of features in a single clustering feature space. In our experiments, we propose to combine syntactic features with sequential lexical patterns for unsupervised clustering of semantic relation instances in the context of (NLP-related) scientific texts. We replicate the experiments of [ ] and augment them with dependency-based syntactic features. We adopt a pairpattern matrix for clustering relation instances. The task can be described as follows: if a1, a2, b1, b2 are pre-annotated domain concepts extracted from a corpus, we would like to classify concept pairs a = (a1, a2) and b = (b1, b2) in homogeneous groups according to their semantic relation. We need an e cient

[1]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[2]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[3]  David J. Weir,et al.  Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.

[4]  Isabelle Tellier,et al.  Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature , 2016, LREC.

[5]  Saif Mohammad,et al.  Experiments with three approaches to recognizing lexical entailment , 2014, Natural Language Engineering.

[6]  Bruno Crémilleux,et al.  Discovering Linguistic Patterns Using Sequence Mining , 2012, CICLing.

[7]  Isabelle Tellier,et al.  Unsupervised Relation Extraction in Specialized Corpora Using Sequence Mining , 2016, IDA.

[8]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[9]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[10]  Peter D. Turney Domain and Function: A Dual-Space Model of Semantic Relations and Compositions , 2012, J. Artif. Intell. Res..

[11]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[12]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[13]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[14]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[15]  Marco Baroni,et al.  BagPack: A General Framework to Represent Semantic Relations , 2009, ArXiv.