Software Re-Use and Evolution in Text Generation Applications

A practical goal for natural language text generation research is to converge on a separation of functions into modules that can be independently re-used. This paper addresses issues related to software re-use and evolution in text generation systems. We describe the benefits we obtained by adapting and generalizing the generation modules and techniques we used for the successive development of three distinct text generation applications, PLANDoc, FLOWDoc, and ZEDDoc. We suggest that design principles such as the use of a common, modular pipeline architecture, a consistent and general data representation for*nat, and domain-independent algorithms for generation subtasks, together with component re-use and adaptation, facilitate both application development and research in the field. In our experience, these principles led to significant reductions in development time for successive applications, from three years to one year to six months, respectively. They also enabled us to isolate domain-specific knowledge and devise reusable, domain-independent algorithms for generation tasks such as ontological generalization and discourse structuring. tThe authors wish to acknowledge Jacques Ftobin, James Shaw, Jong Lira, and Larry Lefkowitz, who also played essential roles in the design and development of PLANDoc and FLOWDOC. Recent technological advances, such as the widespread use of the World Wide Web and ready access to a multitude of extensive large-scale databases, have created novel opportunities for practical text generation applications. At the same time, to take full advantage of these opportunities, text generation systems must be easily adaptable to new domains, changing data formats, and distinct underlying ontologies. One crucial factor contributing to the generalization and subsequent practical and commercial viability of text generation systems is the adaptation and re-use of text generation modules and the development of re-usable tools and techniques. In this paper, we focus on the lessons learned during the successive development of three text generation systems at Bellcore: PLANDoc (McKeown et al., 1994) summarizes execution traces of an expert system for telephone network capacity expansion analysis; FLOwDoc (Passonneau et al., 1996) provides summaries of the most important events in flow diagrams constructed during business reengineering; and ZEDDoc (Passonnean et al., 1997) produces summaries of activity for a user-specified set of advertisements within a user-specified time period from logs of WWW page hits. We built FLowDoc and ZEDDoc by adapting components of the PLANDoc system. The transfer of the original PLANDoc modules to new domains led to the replacement of some hard-coded rules and ontological knowledge with more general, domain-independent components. This encapsulation, or "plug-and-play" feature, enabled the transfer of many of FLowDoc's modules to ZEDDoc

[1]  H. Mack Coherence And Structure , 1974 .

[2]  Laurence Danlos,et al.  The Linguistic Basis of Text Generation , 1987, EACL.

[3]  J. Hobbs On the coherence and structure of discourse , 1985 .

[4]  Karen Kukich,et al.  Knowledge-based report generation : a knowledge engineering approach to natural language report generation , 1983 .

[5]  Kathleen R. McKeown,et al.  The need for text generation , 1985 .

[6]  Kathleen R. McKeown,et al.  Building a Rich Large-scale Lexical Base for Generation , 1997 .

[7]  John T. Cunningham,et al.  New Jersey , 1896, The Journal of Comparative Medicine and Veterinary Archives.

[8]  R. Passonneau Using Centering to Relax Gricean Informational Constraints on Discourse Anaphoric Noun Phrases , 1996 .

[9]  Kathleen McKeown,et al.  Tailoring Lexical Choice to the User's Vocabulary in Multimedia Explanation Generation , 1993, ACL.

[10]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[11]  James Shaw Conciseness through Aggregation in Text Generation , 1995, ACL.

[12]  James Pustejovsky,et al.  Description-directed Natural Language Generation , 1985, IJCAI.

[13]  K. Kukich,et al.  User-Needs Analysis and Design Methodology for an Automated Documentation Generator , 1993 .

[14]  Ehud Reiter,et al.  Has a Consensus NL Generation Architecture Appeared, and is it Psycholinguistically Plausible? , 1994, INLG.

[15]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[16]  James Shaw,et al.  Practical Issues in Automatic Documentation Generation , 1994, ANLP.

[17]  Jacques Robin,et al.  Revision-based generation of natural language summaries providing historical background: corpus-based analysis, design, implementation and evaluation , 1995 .