Learning to Summarize Time Series Data

In this paper we focus on content selection for summarizing time series data using Machine Learning techniques. The goal is to exploit a parallel corpus to predict the appropriate level of abstraction required for a summarization task. This is an important step towards building an automated NLG Natural Language Generation system to generate text for unseen data. Machine learning approaches are used to induce the underlying rules for text summarization, which are potentially close to the ones that humans use to generate textual summaries. We present an approach to select important points in a time series that can aid in generating captions or textual summaries. We evaluate our techniques on a parallel corpus of human generated weather forecast text corresponding to numerical weather prediction data.