Incorporating Punctuation Into the Sentence Grammar: A Lexicalized Tree Adjoining Grammar Perspective

INCORPORATING PUNCTUATION INTO THE SENTENCE GRAMMAR A LEXICALIZED TREE ADJOINING GRAMMAR PERSPECTIVE Christine D Doran Supervisor Aravind K Joshi Punctuation helps us to structure and thus to understand texts Many uses of punctuation straddle the line between syntax and discourse because they serve to combine multiple propositions within a single orthographic sentence They allow us to insert discourse level relations at the level of a single sentence Just as people make use of information from punctuation in processing what they read computers can use information from punctuation in processing texts automatically Most cur rent natural language processing systems fail to take punctuation into account at all losing a valuable source of information about the text Those which do mostly do so in a super cial way again failing to fully exploit the information conveyed by punctuation To be able to make use of such information in a computational system we must rst characterize its uses and nd a suitable representation for encoding them The work here focuses on extending a syntactic grammar to handle phenomena occurring within a single sentence which have punctuation as an integral compo nent Punctuation marks are treated as full edged lexical items in a Lexicalized Tree Adjoining Grammar which is an extremely well suited formalism for encoding punctuation in the sentence grammar Each mark anchors its own elementary trees and imposes constraints on the surrounding lexical items I have analyzed data rep resenting a wide variety of constructions and added treatments of them to the large English grammar which is part of the XTAG system The advantages of using LTAG are that its elementary units are structured trees of a suitable size for stating the constraints we are interested in and the derivation histories it produces contain in formation the discourse grammar will need about which elementary units have used and how they have been combined I also consider in detail a few particularly inter esting constructions where the sentence and discourse grammars meet appositives reported speech and uses of parentheses My results con rm that punctuation can

[1]  C. Clifton,et al.  Thematic roles in sentence parsing. , 1993, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[2]  Hartmut Haberland Reported speech in Danish , 1986 .

[3]  Robin L. Hill,et al.  Commas and Spaces: The Point of Punctuation , 1998 .

[4]  Carlota Smith,et al.  Determiners and Relative Clauses in a Generative Grammer of English , 1964 .

[5]  Ayọ Bamgboṣe Reported speech in Yoruba , 1986 .

[6]  Mark Steedman,et al.  Generating Contextually Appropriate Intonation , 1993, EACL.

[7]  John Penhallurick,et al.  Full‐verb inversion in English , 1984 .

[8]  Ernest Lepore,et al.  You Can Say That Again , 1989 .

[9]  Matthew Stone,et al.  Sentence Planning as Description Using Tree Adjoining Grammar , 1997, ACL.

[10]  Ted Briscoe,et al.  Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels , 1995, IWPT.

[11]  Bernard Jones,et al.  What's the point? : a (computational) theory of punctuation , 1996 .

[12]  Clarisse Sieckenius de Souza,et al.  Getting the message across in RST-based text generation , 1990 .

[13]  Simon Kerl A comprehensive grammar of the English language , .

[14]  R. Beaugrande Text Production: Toward a Science of Composition , 1984 .

[15]  Varol Akman,et al.  An Information-Based Treatment of Punctuation , 1996 .

[16]  S. Thompson,et al.  The discourse conditions for the use of the complementizer that in conversational English , 1991 .

[17]  Bonnie L. Webber,et al.  Anchoring a Lexicalized Tree-Adjoining Grammar for Discourse , 1998, ArXiv.

[18]  Aravind K. Joshi,et al.  Coordination in Tree Adjoining Grammars: Formalization and Implementation , 1996, COLING.

[19]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[20]  岩田 一男,et al.  Essentials of English grammar , 1972 .

[21]  Raman Chandrasekar,et al.  Automatic induction of rules for text simplification , 1997, Knowl. Based Syst..

[22]  Breck Baldwin,et al.  EAGLE: An Extensible Architecture for General Linguistic Engineering , 1997, ANLP.

[23]  Peter Nathan Lasersohn,et al.  The Semantics of Appositive and Pseudo-Appositive NP’s , 1986 .

[24]  Lia Korrel Apposition in contemporary english. Studies in english language: Charles F. Meyer, Cambridge: Cambridge university press, 1992. 152 pp. ISBN 0-521-39475-9 , 1995 .

[25]  Srinivas Bangalore,et al.  Complexity of lexical descriptions and its relevance to partial parsing , 1997 .

[26]  XTAG Research Group,et al.  A Lexicalized Tree Adjoining Grammar for English , 1998, ArXiv.

[27]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[28]  John D. Lafferty,et al.  Cyberpunc: a lightweight punctuation annotation system for speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[29]  Robert Frank,et al.  Syntactic locality and tree adjoining grammar: grammatical, acquisition and processing perspectives , 1992 .

[30]  W. Chafe Punctuation and the Prosody of Written Language , 1988 .

[31]  M. Teresa Espinal,et al.  The Representation of Disjunct Constituents. , 1991 .

[32]  Ferenc Kiefer Some semantic aspects of indirect speech in Hungarian , 1986 .

[33]  Deborah Tannen,et al.  Introducing constructed dialogue in Greek and American conversational and literary narrative , 1986 .

[34]  Richard K. Larson,et al.  Bare-NP adverbs , 1985 .

[35]  Breck Baldwin,et al.  Mother of PERL: A Multi-tier Pattern Description Language , 1996 .

[36]  Ivan Fónagy Reported speech in French and Hungarian , 1986 .

[37]  Ted Briscoe,et al.  Parsing (with) Punctuation etc , 1994 .

[38]  J. R. Kantor,et al.  Direct and indirect speech. , 1936 .

[39]  Stuart M. Shieber,et al.  Synchronous Tree-Adjoining Grammars , 1990, COLING.

[40]  Michael Hand,et al.  On saying that again , 1991 .

[41]  B. G. Hewitt,et al.  Speech reporting in the Caucasus , 1986 .

[42]  Anoop Sarkar,et al.  Deriving Information Structure from Prosodically Marked Text with Lexicalized Tree Adjoining Grammars Gann Bierner and Anoop Sarkar and Aravind Joshi , 1998 .

[43]  Charles F. Meyer,et al.  A Linguistic Study of American Punctuation , 1987 .

[44]  K. Safir Relative clauses in a theory of binding and levels , 1986 .

[45]  Michael Hand Parataxis and parentheticals , 1993 .

[46]  Bilge Say,et al.  An Information-Based Approach to Punctuation , 1997, AAAI/IAAI.

[47]  J. McCawley Parentheticals and discontinuous constituent structure , 1982 .

[48]  Aravind K. Joshi,et al.  Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[49]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[50]  Matthew Stone,et al.  Paying Heed to Collocations , 1996, INLG.

[51]  Beth Ann Hockey,et al.  XTAG System - A Wide Coverage Grammar for English , 1994, COLING.

[52]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[53]  John Charles Trueswell,et al.  The use of verb-based subcategorization and thematic role information in sentence processing , 1993 .

[54]  Mark Stephen Schmidt,et al.  Acoustic correlates of encoded prosody in written conversation , 1996 .

[55]  Bernard E. M. Jones Exploring The Role Of Punctuation In Parsing Natural Text , 1994, COLING.

[56]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[57]  I. Laka Negation in syntax--on the nature of functional categories and projections , 1990 .

[58]  Robert Dale Exploring the Role of Punctuation in the Signalling of Discourse Structure , 1991 .

[59]  Bernard E. M. Jones Towards a Syntactic Account of Punctuation , 1996, COLING.

[60]  David P. B. Massamba Reported speech in Swahili , 1986 .

[61]  Ellen F. Prince,et al.  On the Syntactic Marking of Presupposed Open Propositions , 1986 .

[62]  Claire Gardent,et al.  Underspeciication in Discourse Structure and Semantics , 1998 .