Generating English summaries of time series data using the Gricean maxims

We are developing technology for generating English textual summaries of time-series data, in three domains: weather forecasts, gas-turbine sensor readings, and hospital intensive care data. Our weather-forecast generator is currently operational and being used daily by a meteorological company. We generate summaries in three steps: (a) selecting the most important trends and patterns to communicate; (b) mapping these patterns onto words and phrases; and (c) generating actual texts based on these words and phrases. In this paper we focus on the first step, (a), selecting the information to communicate, and describe how we perform this using modified versions of standard data analysis algorithms such as segmentation. The modifications arose out of empirical work with users and domain experts, and in fact can all be regarded as applications of the Gricean maxims of Quality, Quantity, Relevance, and Manner, which describe how a cooperative speaker should behave in order to help a hearer correctly interpret a text. The Gricean maxims are perhaps a key element of adapting data analysis algorithms for effective communication of information to human users, and should be considered by other researchers interested in communicating data to human users.

[1]  Ralph Grishman,et al.  Analyzing language in restricted domains : sublanguage description and processing , 1986 .

[2]  Ronald R. Yager,et al.  On Linguistic Summaries of Data , 1991, Knowledge Discovery in Databases.

[3]  A. Carlisle Scott,et al.  Practical guide to knowledge acquisition , 1991 .

[4]  Rohit Parikh,et al.  Vagueness and utility: The semantics of common nouns , 1994 .

[5]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[6]  Marian Petre,et al.  Why looking isn't always seeing: readership skills and graphical programming , 1995, CACM.

[7]  Dipankar Dasgupta,et al.  Novelty detection in time series data using ideas from immunology , 1996 .

[8]  Ben Shneiderman,et al.  LifeLines: using visualization to enhance navigation and analysis of patient records , 1998, AMIA.

[9]  Jarke J. van Wijk,et al.  Cluster and Calendar Based Visualization of Time Series Data , 1999, INFOVIS.

[10]  John F. Roddick,et al.  A bibliography of temporal, spatial and spatio-temporal data mining research , 1999, SKDD.

[11]  J. V. van Wijk,et al.  Cluster and calendar based visualization of time series data , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[12]  John F. Roddick,et al.  An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining Research , 2000, TSDM.

[13]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[15]  John F. Roddick,et al.  An updated bibliography of temporal , 2001 .

[16]  Marc Alexa,et al.  Visualizing time-series on spirals , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[17]  Jim Hunter,et al.  A Two-Staged Model For Content Determination , 2001, EWNLG@ACL.

[18]  Jim Hunter,et al.  Modelling the Task of Summarising Time Series Data Using KA Techniques , 2002 .

[19]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[20]  Ehud Reiter,et al.  Squibs and Discussions: Human Variation and Lexical Choice , 2002, CL.

[21]  Ben Shneiderman,et al.  An Augmented Visual Query Mechanism for Finding Patterns in Time Series Data , 2002, FQAS.

[22]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[23]  Jim Hunter,et al.  Segmenting Time Series for Weather Forecasting , 2003 .

[24]  E. Reiter,et al.  Acquiring Correct Knowledge for Natural Language Generation , 2011, J. Artif. Intell. Res..

[25]  Ehud Reiter,et al.  Acquiring and Using Limited User Models in NLG , 2003, ENLG@EACL.

[26]  Jim Hunter,et al.  SumTime-Turbine: A Knowledge-Based System to Communicate Gas Turbine Time-Series Data , 2003, IEA/AIE.

[27]  Jim Hunter,et al.  Exploiting a parallel TEXT - DATA corpus , 2003 .

[28]  Ehud Reiter,et al.  Learning the Meaning and Usage of Time Phrases from a Parallel Text-Data Corpus , 2003, HLT-NAACL 2003.

[29]  Jim Hunter,et al.  Summarizing Neonatal Time Series Data , 2003, EACL.

[30]  Siobhan Chapman Logic and Conversation , 2005 .