论文信息 - Visualizing Multi-document Semantics via Open Domain Information Extraction

Visualizing Multi-document Semantics via Open Domain Information Extraction

Faced with the overwhelming amounts of data in the 24/7 stream of new articles appearing online, it is often helpful to consider only the key entities and concepts and their relationships. This is challenging, as relevant connections may be spread across a number of disparate articles and sources. In this paper, we present a system that extracts salient entities, concepts, and their relationships from a set of related documents, discovers connections within and across them, and presents the resulting information in a graph-based visualization. We rely on a series of natural language processing methods, including open-domain information extraction, a special filtering method to maintain only meaningful relationships, and a heuristic to form graphs with a high coverage rate of topic entities and concepts. Our graph visualization then allows users to explore these connections. In our experiments, we rely on a large collection of news crawled from the Web and show how connections within this data can be explored. Code related to this paper is available at: https://shengyp.github.io/vmse.

[1] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[2] Tao Li,et al. Multi-document summarization via submodularity , 2012, Applied Intelligence.

[3] Roberto Navigli,et al. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity , 2013, ACL.

[4] Oren Etzioni,et al. Open Information Extraction from the Web , 2007, CACM.