An Architecture for Data-to-Text Systems

I present an architecture for data-to-text systems, that is NLG systems which produce texts from non-linguistic input data; this essentially extends the architecture of Reiter and Dale (2000) to systems whose input is raw data instead of AI knowledge bases. This architecture is being used in the BabyTalk project, and is based on experiences in several projects at Aberdeen; it also seems to be compatible with many data-to-text systems developed elsewhere. It consists of four stages which are organised in a pipeline: Signal Analysis, Data Interpretation, Document Planning, and Microplanning and Realisation.

[1]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[2]  Karen Kukich,et al.  Design of a Knowledge-Based Report Generator , 1983, ACL.

[3]  Chris Mellish,et al.  Choosing the content of textual summaries of large time-series data sets , 2006, Natural Language Engineering.

[4]  Donia Scott,et al.  Structural variation in generated health reports , 2005, IWP@IJCNLP.

[5]  Jim Hunter,et al.  Generating English summaries of time series data using the Gricean maxims , 2003, KDD '03.

[6]  Anna S. Law,et al.  A Comparison of Graphical and Textual Presentations of Time Series Data to Support Medical Decision Making in the Neonatal Intensive Care Unit , 2005, Journal of Clinical Monitoring and Computing.

[7]  Jim Hunter,et al.  Choosing words in computer-generated weather forecasts , 2005, Artif. Intell..

[8]  Ehud Reiter,et al.  Generating Spatio-Temporal Descriptions in Pollen Forecasts , 2006, EACL.

[9]  Fabio Pianesi,et al.  Multimodal support to group dynamics , 2007, Personal and Ubiquitous Computing.

[10]  Paul Piwek,et al.  What is NLG? , 2002, INLG.

[11]  James Shaw,et al.  Practical Issues in Automatic Documentation Generation , 1994, ANLP.

[12]  Jim Hunter,et al.  Automatic Generation of Textual Summaries from Neonatal Intensive Care Data , 2007, AIME.

[13]  Jim Hunter,et al.  TSNet - A Distributed Architecture for Time Series Analysis , 2008, Computer-based Medical Guidelines and Protocols.

[14]  Avi Parush,et al.  Helping People with Visual Impairments Gain Access to Graphical Information Through Natural Language: The iGraph System , 2006, ICCHP.

[15]  Owen Rambow,et al.  On the need for domain communication knowledge , 1991 .

[16]  Gerd Herzog,et al.  VIsual TRAnslator: Linking perceptions and natural language descriptions , 1994, Artificial Intelligence Review.

[17]  Yuval Shahar,et al.  A Framework for Knowledge-Based Temporal Abstraction , 1997, Artif. Intell..

[18]  Richard I. Kittredge,et al.  Using natural-language processing to produce weather forecasts , 1994, IEEE Expert.

[19]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[20]  Chris Mellish,et al.  A Reference Architecture for Natural Language Generation Systems , 2006, Natural Language Engineering.