Automatic Aspect Identification: The Case of Informative Microaspects in News Texts

Informative aspects represent the basic units of information in texts. For example, in news texts they could represent the following information: what happened, when it happened and where it happened. With the identification of these aspects, it is possible to automate some NLP tasks such as Summarization, Question Answering and Information Extraction. Microaspects --a type of informative aspects-represent local segments of the sentence. In this paper, we automatically identify microaspects using Semantic Role Labeling, Named-Entity Recognition, Handcrafted Rules and Machine Learning techniques. We evaluate our proposal using the CSTNews journalistic corpus, which has manually annotated aspects. The results are satisfactory, and prove that microaspects can be automatically identified in news texts with acceptable performance.