Extraction of Main Event Descriptors from News Articles by Answering the Journalistic Five W and One H Questions

The identification and extraction of the events that news articles report on is a commonly performed task in the analysis workflow of various projects that analyze news articles. However, due to the lack of universally usable and publicly available methods for news articles, many researchers must redundantly implement methods for event extraction to be used within their projects. Answers to the journalistic five W and one H questions (5W1H) describe the main event of a news story, i.e., who did what, when, where, why, and how. We propose Giveme5W1H, an open-source system that uses syntactic and domain-specific rules to extract phrases answering the 5W1H. In our evaluation, we find that the extraction precision of 5W1H phrases is p=0.64, and p=0.79 for the first four W questions, which discretely describe an event.