Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems

The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Oren Etzioni,et al.  No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities , 2012, EMNLP.

[3]  Wenpeng Yin,et al.  An Exploration of Embeddings for Generalized Phrases , 2014, ACL.

[4]  Gerhard Weikum,et al.  HYENA: Hierarchical Type Classification for Entity Names , 2012, COLING.

[5]  Satoshi Sekine,et al.  Extended Named Entity Ontology with Attribute Information , 2008, LREC.

[6]  Krisztian Balog,et al.  Hierarchical target type identification for entity-oriented queries , 2012, CIKM.

[7]  Nevena Lazic,et al.  Context-Dependent Fine-Grained Entity Type Tagging , 2014, ArXiv.

[8]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[9]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[10]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Changki Lee,et al.  Fine-Grained Named Entity Recognition Using Conditional Random Fields for Question Answering , 2006, AIRS.

[13]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[14]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[15]  Heng Ji,et al.  Entity Linking for Biomedical Literature , 2014, DTMBIO '14.

[16]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[17]  Phil Blunsom,et al.  The Role of Syntax in Vector Space Models of Compositional Semantics , 2013, ACL.

[18]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[19]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[20]  Heng Ji,et al.  Liberal Event Extraction and Event Schema Induction , 2016, ACL.

[21]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[22]  Shujian Huang,et al.  Learning word embeddings from dependency relations , 2014, 2014 International Conference on Asian Language Processing (IALP).

[23]  Nevena Lazic,et al.  Embedding Methods for Fine Grained Entity Type Classification , 2015, ACL.

[24]  Gerhard Weikum,et al.  FINET: Context-Aware Fine-Grained Named Entity Typing , 2015, EMNLP.

[25]  Paolo Bouquet,et al.  Searching for individual entities: A query analysis , 2010, 2010 IEEE International Conference on Information Reuse & Integration.

[26]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[27]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[28]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[29]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[30]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[31]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[33]  Heng Ji,et al.  Language and Domain Independent Entity Linking with Quantified Collective Validation , 2015, EMNLP.

[34]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[35]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[36]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[37]  Jaime G. Carbonell,et al.  A Discriminative Graph-Based Parser for the Abstract Meaning Representation , 2014, ACL.

[38]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[39]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[40]  Heng Ji,et al.  Unsupervised Entity Linking with Abstract Meaning Representation , 2015, NAACL.

[41]  Daniel S. Weld,et al.  Fine-Grained Entity Recognition , 2012, AAAI.