ICRC-DSEDL: A Film Named Entity Discovery and Linking System Based on Knowledge Bases

Named entity discovery and linking are hot topics in text mining, which is very important for text understanding as named entities that usually presented in various formats and some of them are ambiguous. To accelerate the development of related technology, the China Conference on Knowledge Graph and Semantic Computing (CCKS) in 2016 launches a competition, which includes a task on film named entity discovery and linking (i.e., task 1). We participate this competition and develop a system for task 1 of the CCKS competition. The system consists of two individual parts for named entity discovery (NED) and entity linking (EL) respectively. The first part is a hybrid subsystem based on conditional random field (CRF) and structural support vector machine (SSVM) with rich features, and the second part is a ranking subsystem where not only the given knowledge base but also open knowledge bases are used for candidate generation and SVMrank is used for candidate ranking. On the official test dataset of Task1 of CCKS 2016 competition, our system achieves an F1-score of 77.83% on NED, an accuracy of 86.53% on EL and an overall F1-score of 67.35%.