On the role of linguistic descriptions of data in the building of natural language generation systems

This paper explores the current state of the task of generating easily understandable information from data for people using natural language, which is currently addressed by two independent research fields: the natural language generation field - and, more specifically, the data-to-text sub-field - and the linguistic descriptions of data field. Both approaches are explained in a detailed description which includes: i) a methodological revision of both fields including basic concepts and definitions, models and evaluation procedures; ii) the most relevant systems, use cases and real applications described in the literature. Some reflections about the current state and future trends of each field are also provided, followed by several remarks that conclude by hinting at some potential points of mutual interest and convergence between both fields.

[1]  Slawomir Zadrozny,et al.  Computing With Words Is an Implementable Paradigm: Fuzzy Queries, Linguistic Data Summaries, and Natural-Language Generation , 2010, IEEE Transactions on Fuzzy Systems.

[2]  Jim Hunter,et al.  An approach to generating summaries of time series data in the gas turbine domain , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[3]  Alberto J. Cañas,et al.  An Approach to the Linguistic Summarization of Data , 1990, IPMU.

[4]  William R. Swartout,et al.  A Digitalis Therapy Advisor with Explanations , 1977, IJCAI.

[5]  Jim Hunter,et al.  SumTime-Turbine: A Knowledge-Based System to Communicate Gas Turbine Time-Series Data , 2003, IEA/AIE.

[6]  Ehud Reiter,et al.  Using Spatial Reference Frames to Generate Grounded Textual Summaries of Georeferenced Data , 2008, INLG.

[7]  Gracián Triviño,et al.  Automatically Generated Linguistic Summaries of Energy Consumption Data , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[8]  P. Dangerfield Logic , 1996, Aristotle and the Stoics.

[9]  Anna Wilbik,et al.  A distance metric for a space of linguistic summaries , 2012, Fuzzy Sets Syst..

[10]  Sabine Geldof,et al.  Using Natural Language Generation for Navigational Assistance , 2003, ACSC.

[11]  G. Lapalme,et al.  Generating paraphrases from meaning‐text semantic networks , 1985 .

[12]  Slawomir Zadrozny,et al.  Linguistic Data Summarization , 2009, Scalable Fuzzy Algorithms for Data Management and Analysis.

[13]  Gracián Triviño,et al.  Linguistic reporting of driver behavior: Summary and event description , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[14]  Michael Zock,et al.  Natural Language Generation , 2014 .

[15]  Jim Hunter,et al.  A New Architecture for Summarising Time Series Data , 2004 .

[16]  Ion Androutsopoulos,et al.  Generating Multilingual Descriptions from Linguistically Annotated OWL Ontologies: the NaturalOWL System , 2007, ENLG.

[17]  Johanna D. Moore,et al.  Describing Complex Charts in Natural Language: A Caption Generation System , 1998, CL.

[18]  Daniel Sánchez,et al.  Quality Assessment in Linguistic Summaries of Data , 2012, IPMU.

[19]  Daniel Sánchez,et al.  A proposal for the hierarchical segmentation of time series. Application to trend-based linguistic description , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[20]  Michael White,et al.  EXEMPLARS: A Practical, Extensible Framework For Dynamic Text Generation , 1998, INLG.

[21]  Adam Niewiadomski,et al.  Multi-Subject Type-2 Linguistic Summaries of Relational Databases , 2015 .

[22]  Ehud Reiter,et al.  SumTime-Mousam: Configurable marine weather forecast generator , 2003 .

[23]  Silvia Miksch,et al.  Knowledge Representation for Health Care , 2014, Lecture Notes in Computer Science.

[24]  Albert Gatt,et al.  Towards a Possibility-Theoretic Approach to Uncertainty in Medical Data Interpretation for Text Generation , 2009, KR4HC.

[25]  Alberto Bugarín,et al.  On the role of fuzzy quantified statements in linguistic summarization of data , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[26]  Sarah E. Boyd TREND: A System for Generating Intelligent Descriptions of Time-Series Data , 1998 .

[27]  François Portet,et al.  Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009) , 2015 .

[28]  Emiel Krahmer,et al.  Squibs and Discussions: Real versus Template-Based Natural Language Generation: A False Opposition? , 2005, CL.

[29]  Nina Dethlefs,et al.  Context-Sensitive Natural Language Generation: From Knowledge-Driven to Data-Driven Techniques , 2014, Lang. Linguistics Compass.

[30]  Ehud Reiter,et al.  Selecting the Content of Textual Descriptions of Geographically Located Events in Spatio-Temporal Weather Data , 2007, SGAI Conf..

[31]  Steve J. Young,et al.  Stochastic Language Generation in Dialogue using Factored Language Models , 2014, Computational Linguistics.

[32]  Albert Gatt,et al.  SimpleNLG: A Realisation Engine for Practical Applications , 2009, ENLG.

[33]  Kees van Deemter Generating Referring Expressions that Involve Gradable Properties , 2006, CL.

[34]  Richard I. Kittredge,et al.  Using natural-language processing to produce weather forecasts , 1994, IEEE Expert.

[35]  Lotfi A. Zadeh,et al.  From Computing with Numbers to Computing with Words - from Manipulation of Measurements to Manipulation of Perceptions , 2005, Logic, Thought and Action.

[36]  Albert Gatt,et al.  Introducing Shared Tasks to NLG: The TUNA Shared Task Evaluation Challenges , 2010, Empirical Methods in Natural Language Generation.

[37]  Richard Power,et al.  Generating Numerical Approximations , 2012, Computational Linguistics.

[38]  Anna Wilbik,et al.  Using Fuzzy Linguistic Summaries for the Comparison of Time Series: an application to the analysis of investment fund quotations , 2009, IFSA/EUSFLAT Conf..

[39]  Anja Belz,et al.  Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models , 2008, Natural Language Engineering.

[40]  Mirella Lapata,et al.  Collective Content Selection for Concept-to-Text Generation , 2005, HLT.

[41]  C. Mellish,et al.  ILEX: an architecture for a dynamic hypertext generation system , 2001, Natural Language Engineering.

[42]  Kees van Deemter,et al.  From RAGS to RICHES: Exploiting the Potential of a Flexible Generation Architecture , 2001, ACL.

[43]  José Coch System Demonstration Interactive Generation And Knowledge Administration In Multimeteo , 1998, INLG.

[44]  Tilman Becker,et al.  DFKI Workshop on Natural Language Generation , 1997 .

[45]  Anna Wilbik,et al.  Similarity evaluation of sets of linguistic summaries , 2012, Int. J. Intell. Syst..

[46]  Oliver Lemon,et al.  Adaptive Generation in Dialogue Systems Using Dynamic User Modeling , 2014, CL.

[47]  Chris Mellish,et al.  A Reference Architecture for Natural Language Generation Systems , 2006, Natural Language Engineering.

[48]  Gracián Triviño,et al.  OLAP navigation in the Granular Linguistic Model of a Phenomenon , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[49]  van DeemterKees Generating Referring Expressions that Involve Gradable Properties , 2006 .

[50]  Albert Gatt,et al.  Automatic generation of textual summaries from neonatal intensive care data , 2009 .

[51]  C. Mellish,et al.  Instance-Based Natural Language Generation , 2010, NAACL.

[52]  DAlberto Bugar ´ in Semi-Fuzzy Quantifiers as a Tool for Building Linguistic Summaries of Data Patterns , 2011 .

[53]  Janusz Kacprzyk,et al.  Using Ant Colony Optimization and Genetic Algorithms for the Linguistic Summarization of Creep Data , 2014, IEEE Conf. on Intelligent Systems.

[54]  Michio Sugeno,et al.  Towards linguistic descriptions of phenomena , 2013, Int. J. Approx. Reason..

[55]  J. Coch,et al.  Quality tests for a mail generation system , 1995 .

[56]  José Coch Evaluating and comparing three text-production techniques , 1996, COLING.

[57]  Nicolás Marín,et al.  A Fuzzy Approach to the Linguistic Summarization of Time Series , 2011, J. Multiple Valued Log. Soft Comput..

[58]  Sabine Geldof,et al.  Using Natural Language Generation in Automatic Route Description , 2005, J. Res. Pract. Inf. Technol..

[59]  Ichiro Kobayashi,et al.  Verbalizing Time-series Data: With an Example of Stock Price Trends , 2009, IFSA/EUSFLAT Conf..

[60]  Kees van Deemter Utility and Language Generation: The Case of Vagueness , 2009, J. Philos. Log..

[61]  Dimitra Gkatzia,et al.  Comparing Multi-label Classification with Reinforcement Learning for Summarisation of Time-series Data , 2014, ACL.

[62]  Ronald R. Yager,et al.  A new approach to the summarization of data , 1982, Inf. Sci..

[63]  Blake Howald,et al.  A Statistical NLG Framework for Aggregated Planning and Realization , 2013, ACL.

[64]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[65]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[66]  Ehud Reiter Task-Based Evaluation of NLG Systems: Control vs Real-World Context , 2011 .

[67]  Jim Hunter,et al.  Generating English summaries of time series data using the Gricean maxims , 2003, KDD '03.

[68]  Chris Mellish,et al.  ILEX: an architecture for a dynamic hypertext generation system , 2001, Nat. Lang. Eng..

[69]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[70]  J Eriksson Lessons from a failure : Generating tailored smoking cessation letters , 2003 .

[71]  Albert Gatt,et al.  Automatic generation of natural language nursing shift summaries in neonatal intensive care: BT-Nurse , 2012, Artif. Intell. Medicine.

[72]  Daniel Sánchez,et al.  Fuzzy quantification: a state of the art , 2014, Fuzzy Sets Syst..

[73]  Somayajulu Sripada,et al.  A Case Study: NLG meeting Weather Industry Demand for Quality and Quantity of Textual Weather Forecasts , 2014, INLG.

[74]  Albert Gatt,et al.  What is in a text and what does it do: Qualitative Evaluations of an NLG system – the BT-Nurse – using content analysis and discourse analysis , 2011, ENLG.

[75]  Gracián Triviño,et al.  Selection of the Best Suitable Sentences in Linguistic Descriptions of Data , 2012, IPMU.

[76]  Anja Belz,et al.  Comparing Automatic and Human Evaluation of NLG Systems , 2006, EACL.

[77]  Lotfi A. Zadeh,et al.  Computing with Words and Perceptions - A Paradigm Shift , 2009, PDPTA.

[78]  Helmut Horacek,et al.  Generating Air Quality Reports From Environmental Data , 1997 .

[79]  Anna Wilbik,et al.  Evaluation of the Truth Value of Linguistic Summaries - Case with Non-monotonic Quantifiers , 2014, IEEE Conf. on Intelligent Systems.

[80]  L. Zadeh A new direction in AI: toward a computational theory of perceptions , 2002 .

[81]  Robert Dale,et al.  Building applied natural language generation systems , 1997, Natural Language Engineering.

[82]  Holger Stenzhorn XtraGen - A Natural Language Generation System Using XML- and Java-Technologies , 2002, NLPXML@COLING.

[83]  Mirella Lapata,et al.  A Global Model for Concept-to-Text Generation , 2013, J. Artif. Intell. Res..

[84]  Adam Niewiadomski,et al.  A Type-2 Fuzzy Approach to Linguistic Summarization of Data , 2008, IEEE Transactions on Fuzzy Systems.

[85]  Ehud Reiter,et al.  Types of Knowledge Required to Personalise Smoking Cessation Letters , 1999, AIMDM.

[86]  Elena Not,et al.  Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project , 2001, ArXiv.

[87]  Lotfi A. Zadeh,et al.  Fuzzy logic = computing with words , 1996, IEEE Trans. Fuzzy Syst..

[88]  Gracián Triviño,et al.  Automatic linguistic reporting in driving simulation environments , 2013, Appl. Soft Comput..

[89]  R. McClelland,et al.  The self and its brain. , 1990, The Ulster medical journal.

[90]  Alain Polguère,et al.  Synthesizing Weather Forecasts from Formatted Data , 1986, COLING.

[91]  L. T. F. Gamut Logic, language, and meaning , 1991 .

[92]  Sergei Nirenburg,et al.  Generating Patent Claims from Interactive Input , 1996, INLG.

[93]  Sabine Geldof,et al.  CORAL: using natural language generation for navigational assistance , 2003 .

[94]  Ehud Reiter,et al.  An Architecture for Data-to-Text Systems , 2007, ENLG.

[95]  Slawomir Zadrozny,et al.  Linguistic database summaries and their protoforms: towards natural language based knowledge discovery tools , 2005, Inf. Sci..

[96]  A. Bugarin,et al.  Automatic linguistic descriptions of meteorological data a soft computing approach for converting open data to open information , 2013, 2013 8th Iberian Conference on Information Systems and Technologies (CISTI).

[97]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[98]  Gracián Triviño,et al.  An approach to automatic learning assessment based on the computational theory of perceptions , 2012, Expert Syst. Appl..

[99]  Gracián Triviño,et al.  Linguistic Description of Human Activity Based on Mobile Phone's Accelerometers , 2012, IWAAL.

[100]  Sergei Nirenburg,et al.  Knowledge Elecitation for Authoring Patent Claims , 1996, Computer.

[101]  Sabine Geldof,et al.  Generating more natural route descriptions , 2002 .

[102]  Gracián Triviño,et al.  Linguistic description of the human gait quality , 2013, Eng. Appl. Artif. Intell..

[103]  Nicolás Marín,et al.  Linguistic Summarization of Time Series Data using Genetic Algorithms , 2011, EUSFLAT Conf..

[104]  Senén Barro,et al.  Linguistic Descriptions for Automatic Generation of Textual Short-Term Weather Forecasts on Real Prediction Data , 2015, IEEE Trans. Fuzzy Syst..