A learning based model for headline extraction of news articles to find explanatory sentences for events

Metadata information plays a crucial role in augmenting document organising efficiency and archivability. News metadata includes DateLine, ByLine, HeadLine and many others. We found that HeadLine information is useful for guessing the theme of the news article. Particularly for financial news articles, we found that HeadLine can thus be specially helpful to locate explanatory sentences for any major events such as significant changes in stock prices. In this paper we explore a support vector based learning approach to automatically extract the HeadLine metadata. We find that the classification accuracy of finding the HeadLines improves if DateLines are identified first. We then used the extracted HeadLines to initiate a pattern matching of keywords to find the sentences responsible for story theme. Using this theme and a simple language model it is possible to locate any explanatory sentences for any significant price change.