论文信息 - Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events

Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events

Real world events, such as historic incidents, typically contain both spatial and temporal aspects and involve a specific group of persons. This is reflected in the descriptions of events in textual sources, which contain mentions of named entities and dates. Given a large collection of documents, however, such descriptions may be incomplete in a single document, or spread across multiple documents. In these cases, it is beneficial to leverage partial information about the entities that are involved in an event to extract missing information. In this paper, we introduce the LOAD model for cross-document event extraction in large-scale document collections. The graph-based model relies on co-occurrences of named entities belonging to the classes locations, organizations, actors, and dates and puts them in the context of surrounding terms. As such, the model allows for efficient queries and can be updated incrementally in negligible time to reflect changes to the underlying document collection. We discuss the versatility of this approach for event summarization, the completion of partial event information, and the extraction of descriptions for named entities and dates. We create and provide a LOAD graph for the documents in the English Wikipedia from named entities extracted by state-of-the-art NER tools. Based on an evaluation set of historic data that include summaries of diverse events, we evaluate the resulting graph. We find that the model not only allows for near real-time retrieval of information from the underlying document collection, but also provides a comprehensive framework for browsing and summarizing event data.

Andreas Spitz | Michael Gertz | Andreas Spitz | Michael Gertz

[1] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[2] Gerhard Weikum,et al. InZeit: Efficiently Identifying Insightful Time Points , 2010, Proc. VLDB Endow..

[3] Johanna Geiß,et al. Beyond friendships and followers: The Wikipedia social network , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[4] Klaus Berberich,et al. Identifying Time Intervals of Interest to Queries , 2014, CIKM.

[5] Mark A. Przybocki,et al. The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[6] Mitul Tiwari,et al. Entity Extraction, Linking, Classification, and Tagging for Social Media: A Wikipedia-Based Approach , 2013, Proc. VLDB Endow..

[7] Tanmoy Chakraborty,et al. OverCite: Finding overlapping communities in citation network , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[8] Ricardo Campos,et al. Survey of Temporal Information Retrieval and Related Applications , 2014, ACM Comput. Surv..

[9] David Yarowsky,et al. One Sense Per Discourse , 1992, HLT.

[10] Andreas Spitz,et al. Terms in Time and Times in Context: A Graph-based Term-Time Ranking Model , 2015, WWW.

[11] Cong Yu,et al. Dynamic relationship and event discovery , 2011, WSDM '11.

[12] Johanna Geiß,et al. The Wikipedia location network: overcoming borders and oceans , 2015, GIR.

[13] Nattiya Kanhabua,et al. Identifying Relevant Temporal Expressions for Real-World Events , 2012 .

[14] Tao Tao,et al. An exploration of proximity measures in information retrieval , 2007, SIGIR.

[15] Christina Lioma,et al. Graph-based term weighting for information retrieval , 2011, Information Retrieval.

[16] Mark Gahegan,et al. Frankenplace: Interactive Thematic Mapping for Ad Hoc Exploratory Search , 2015, WWW.

[17] Gerhard Weikum,et al. Cross-Document Co-Reference Resolution using Sample-Based Clustering with Knowledge Enrichment , 2015, TACL.

[18] Wolfgang Nejdl,et al. On the Value of Temporal Anchor Texts in Wikipedia , 2014 .

[19] Wolfgang Nejdl,et al. Extracting Event-Related Information from Article Updates in Wikipedia , 2013, ECIR.