The rhetorical parsing, summarization, and generation of natural language texts

This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid. The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of a sequence of units among which some rhetorical relations hold. Two algorithms apply model-theoretic techniques; the other two apply proof-theoretic techniques. The formalization and the algorithms mentioned so far correspond to the theoretical facet of the thesis. An exploratory corpus analysis of cue phrases provides the means for applying the formalization to unrestricted natural language texts. A set of empirically motivated algorithms were designed in order to determine the elementary textual units of a text, to hypothesize rhetorical relations that hold among these units, and eventually, to derive the discourse structure of that text. The process that finds the discourse structure of unrestricted natural language texts is called rhetorical parsing. The thesis explores two possible applications of the text theory that it proposes. The first application concerns a discourse-based summarization system, which is shown to significantly outperform both a baseline algorithm and a commercial system. An empirical psycholinguistic experiment not only provides an objective evaluation of the summarization system, but also confirms the adequacy of using the text theory proposed here in order to determine the most important units in a text. The second application concerns a set of text planning algorithms that can be used by natural language generation systems in order to construct text plans in the cases in which the high-level communicative goal is to map an entire knowledge pool into text.

[1]  Nan Decker,et al.  The Use of Syntactic Clues in Discourse Processing , 1985, ACL.

[2]  Johanna D. Moore,et al.  Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information , 1993, CL.

[3]  E. M. Segal,et al.  The role of interclausal connectives in narrative structuring: Evidence from adults' interpretations of simple stories , 1991 .

[4]  James R. Meehan,et al.  TALE-SPIN, An Interactive Program that Writes Stories , 1977, IJCAI.

[5]  Eduard H. Hovy Two Types of Planning in Language Generation , 1988, ACL.

[6]  Barbara Di Eugenio,et al.  Understanding natural language instructions: a computational approach to purpose clauses , 1993 .

[7]  Elizabeth Du,et al.  The discourse-level structure of empirical abstracts: an exploratory study , 1991, Inf. Process. Manag..

[8]  Manfred Stede,et al.  Ma(r)king concessions in English and German , 1995, ArXiv.

[9]  Cécile L. Paris,et al.  Generation and Explanation: Building an Explanation Facility for the Explainable Expert Systems Framework , 1991 .

[10]  Xiaorong Huang,et al.  Planning Argumentative Texts , 1994, COLING.

[11]  Andrew Smith,et al.  Detecting Subject Boundaries Within Text: A Language Independent Statistical Approach , 1997, EMNLP.

[12]  William C. Mann,et al.  Discourse Structures for Text Generation , 1984, ACL.

[13]  Rebecca J. Passonneau,et al.  Empirical Analysis of Three Dimensions of Spoken Discourse: Segmentation, Coherence, and Linguistic Devices , 1996 .

[14]  Mark Persoff UK , 1999, EC Tax Review.

[15]  Brigitte Endres-Niggemeyer SimSum: Simulation of summarizing , 1997, Workshop On Intelligent Scalable Text Summarization.

[16]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[17]  Johanna D. Moore,et al.  A Problem for RST: The Need for Multi-Level Discourse Analysis , 1992, CL.

[18]  Verzekeren Naar Sparen,et al.  Cambridge , 1969, Humphrey Burton: In My Own Time.

[19]  Vibhu O. Mittal,et al.  Employing Knowledge Resources in a New Text Planner Architecture , 1992, NLG.

[20]  V. Dijk,et al.  Some aspects of text grammars : a study in theoretical linguistics and poetics , 1972 .

[21]  William R. Swartout,et al.  A Reactive Approach to Explanation: Taking the User’s Feedback into Account , 1991 .

[22]  D. Rumelhart NOTES ON A SCHEMA FOR STORIES , 1975 .

[23]  Leonard Talmy,et al.  How Language Structures Space , 1983 .

[24]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[25]  Stephen Pulman,et al.  Shallow processing and automatic summarising: a first study , 1991 .

[26]  Julia Hirschberg,et al.  Now Let’s Talk About Now; Identifying Cue Phrases Intonationally , 1987, ACL.

[27]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[28]  David A. McAllester,et al.  Screamer: A Portable Efficient Implementation of Nondeterministic Common LISP , 1993 .

[29]  Sanda M. Harabagiu,et al.  TextNet - A text-based intelligent system , 1997, Nat. Lang. Eng..

[30]  Manfred Stede,et al.  Customizing RST for the Automatic Production of Technical Manuals , 1992, NLG.

[31]  Masaru Tomita,et al.  Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems , 1985 .

[32]  Edward H. Hovy,et al.  Unresolved issues in paragraph planning , 1990 .

[33]  Donna Harman,et al.  Multi-task multi-modality SVM for early COVID-19 Diagnosis using chest CT data , 2021, Information Processing & Management.

[34]  Chris D. Paice,et al.  The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases , 1980, SIGIR '80.

[35]  James H. Martin,et al.  Expressing Rhetorical Relations in Instructional Text: A Case Study of the Purposes Relation , 1995, Comput. Linguistics.

[36]  Ann L. Brown,et al.  Macrorules for summarizing texts: the development of expertise , 1983 .

[37]  Kathleen F. McCoy,et al.  Focus of attention: Constraining what can be said next , 1991 .

[38]  Henry Kautz,et al.  Pushing the envelope: planning , 1996 .

[39]  Robert C. Berwick,et al.  Computational complexity and natural language , 1987 .

[40]  James M. Crawford,et al.  Experimental Results on the Crossover Point in Random 3-SAT , 1996, Artif. Intell..

[41]  Philippe Flajolet,et al.  An introduction to the analysis of algorithms , 1995 .

[42]  Christine Eiser,et al.  Toward a Synthesis , 1985 .

[43]  Kenji Ono,et al.  A Discourse Structure Analyzer for Japanese Text , 1992, Fifth Generation Computer Systems.

[44]  D. Marcu Perlocutions: The Achilles' heel of speech act theory , 2000 .

[45]  Ingrid Zukerman,et al.  Comprehension-Driven Generation of Meta-Technical Utterances in Math Tutoring , 1986, AAAI.

[46]  Karen Spärck Jones What Might be in a Summary? , 1993, Information Retrieval.

[47]  木村 和夫 Pragmatics , 1997, Language Teaching.

[48]  Robert J. Gaizauskas,et al.  Coupling information retrieval and information extraction: A new text technology for gathering information from the web , 1997, RIAO.

[49]  W. G. Cochran The comparison of percentages in matched samples. , 1950, Biometrika.

[50]  Kathleen McKeown,et al.  Emergent Linguistic Rules from inducing Decision Trees: Disambiguating Discourse Clue Words , 1994, AAAI.

[51]  Julia Hirschberg,et al.  Discourse Structure in Spoken Language: Studies on Speech Corpora , 1995 .

[52]  C. Chrisment,et al.  Computer-Assisted Information Searching on Internet , 1997 .

[53]  Peter F. Patel-Schneider,et al.  The CLASSIC knowledge representation system: guiding principles and implementation rationale , 1991, SGAR.

[54]  Eduard H. Hovy,et al.  Pragmatics and Natural Language Generation , 1990, Artif. Intell..

[55]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[56]  Julia Hirschberg,et al.  Some intonational characteristics of discourse structure , 1992, ICSLP.

[57]  Vivian Gussin Paley,et al.  Wally’s Stories , 2009 .

[58]  张万春 Mind Your P's and Q's , 2000 .

[59]  Jon Oberlander,et al.  Aspect-Switching and Subordination: the Role of It-Clefts in Discourse , 1992, COLING.

[60]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[61]  Chris Brew,et al.  Letting the Cat Out of the Bag: Generation for Shake-and-Bake MT , 1992, COLING.

[62]  Lawrence Birnbaum,et al.  Argument Molecules: A Functional Representation of Argument Structure , 1982, AAAI.

[63]  Joseph E. Grimes,et al.  The Thread of Discourse , 1984 .

[64]  Robin Cohen,et al.  A computational model for the analysis of arguments , 1983 .

[65]  Alain Polguère,et al.  A Formal Lexicon in the Meaning-Text Theory (or How to Do Lexica with Words) , 1987, Comput. Linguistics.

[66]  J. Anscombre,et al.  L'argumentation dans la langue , 1976 .

[67]  Carol Sherrard,et al.  Teaching students to summarize: Applying textlinguistics , 1989 .

[68]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[69]  Abderrafih Lehmam Automatic summarization on the web?: a system for summarizing using indicating fragments: RAFI , 1997 .

[70]  Christoph Schwarz Content based text handling , 1990, Inf. Process. Manag..

[71]  R. N. Indah Language and Speech , 1958, Nature.

[72]  Rebecca J. Passonneau,et al.  Intention-Based Segmentation: Human Reliability and Correlation with Linguistic Cues , 1993, ACL.

[73]  A. Macrae Book Review: International Journal of Man-Machine Studies , 1970 .

[74]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[75]  Johanna D. Moore,et al.  DPOCL: A Principled Approach To Discourse Planning , 1994, INLG.

[76]  Barbara Di Eugenio,et al.  Understanding Natural Language Instructions: The Case of Purpose Clauses , 1992, ACL.

[77]  G. Miller,et al.  Cognitive science. , 1981, Science.

[78]  Kathleen R. McKeown,et al.  Natural language generation in COMET , 1990 .

[79]  Uwe Reyle,et al.  From Discourse to Logic - Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory , 1993, Studies in linguistics and philosophy.

[80]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[81]  Robert M. MacGregor,et al.  The Loom Knowledge Representation Language. , 1987 .

[82]  J. Kellett London , 1914, The Hospital.

[83]  Michael Halliday,et al.  Cohesion in English , 1976 .

[84]  S. Garrod,et al.  Understanding written language: Explorations of comprehension beyond the sentence , 1981 .

[85]  Maria Aretoulaki,et al.  COSY-MATS: An Intelligent and Scalable Summarisation Shell , 1997 .

[86]  Walter Kintsch,et al.  Cognitive Psychology and Discourse: Recalling and Summarizing Stories , 1978 .

[87]  Marc Moens,et al.  Algorithms for Analysing the Temporal Structure of Discourse , 1995, EACL.

[88]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[89]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[90]  Keith Neil Vander Linden Speaking of actions: choosing rhetorical status and grammatical form in instructional text generation , 1993 .

[91]  A. Favero,et al.  Italy , 1996, The Lancet.

[92]  Michael Hoey,et al.  Patterns of Lexis In Text , 1991 .

[93]  Robert Dale,et al.  Choosing a Set of Coherence Relations for Text Generation: A Data-Driven Approach , 1993, EWNLG.

[94]  A. Cawsey Book Reviews: Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context , 1995, CL.

[95]  Andrew Dillon,et al.  Readers' Models of Text Structures: The Case of Academic Articles , 1991, Int. J. Man Mach. Stud..

[96]  P. Dekker,et al.  Discourse Grammar and Dynamic Logic , 1996 .

[97]  Robin Cohen,et al.  Analyzing the Structure of Argumentative Discourse , 1987, CL.

[98]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[99]  Alex Lascarides,et al.  Abducing Temporal Discourse , 1992, NLG.

[100]  Bonnie L. Webber,et al.  Discourse Deixis: Reference to Discourse Segments , 1988, ACL.

[101]  Udo Hahn,et al.  Centering in-the-Large: Computing Referential Discourse Segments , 1997, ACL.

[102]  J. L. Weiner,et al.  BLAH, A System Which Explains its Reasoning , 1980, Artif. Intell..

[103]  Jennifer Chu-Carroll,et al.  Rhetorical Relations: Necessary But Not Sufficient , 1993, ACL 1993.

[104]  Bernard Schneuwly Textual Organizers and Text Types: Ontogenetic Aspects in Writing , 1997 .

[105]  D MooreJohanna,et al.  A problem for RST , 1992 .

[106]  Bonnie L. Webber,et al.  Tense as Discourse Anaphor , 1988, CL.

[107]  Vibhu O. Mittal,et al.  Generating Natural Language Descriptions with Examples: Differences between Introductory and Advanced Texts , 1993, AAAI.

[108]  Marie W. Meteer The Implications of Revisions for Natural Language Generation , 1991 .

[109]  Eduard H. Hovy,et al.  Planning Coherent Multisentential Text , 1988, ACL.

[110]  Eduard Hovy,et al.  Generating Natural Language Under Pragmatic Constraints , 1988 .

[111]  Peter F. Patel-Schneider,et al.  "Reducing" CLASSIC to Practice: Knowledge Representation Theory Meets Reality , 1999, Artif. Intell..

[112]  R. Lathe Phd by thesis , 1988, Nature.

[113]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[114]  Udo Hahn,et al.  Topic parsing: Accounting for text macro structures in full-text analysis , 1990, Inf. Process. Manag..

[115]  Peter Norvig,et al.  Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp , 1991 .

[116]  M Pupier [About the thesaurus...]. , 1997, Soins. Formation, pedagogie, encadrement : avec la participation du CEEIEC.

[117]  Kathleen McKeown,et al.  Text generation: using discourse strategies and focus constraints to generate natural language text , 1985 .

[118]  E. F. Skorochod'ko Adaptive Method of Automatic Abstracting and Indexing , 1971, IFIP Congress.

[119]  B. Keller A Logic for Representing Grammatical Knowledge , 1992, ECAI.

[120]  Jerry R. Hobbs Literature And Cognition , 1990 .

[121]  Yves Bestgen,et al.  Temporal markers of narrative structure. Studies in production , 1996 .

[122]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.