Finding Syntactic Structure in Unparsed Corpora The Gsearch Corpus Query System

The Gsearch system allows the selection of sentences by syntacticcriteria from text corpora, even when these corpora contain no priorsyntactic markup. This is achieved by means of a fast chart parser,which takes as input a grammar and a search expression specified by theuser. Gsearch features a modular architecture that can be extendedstraightforwardly to give access to new corpora. The Gsearcharchitecture also allows interfacing with external linguistic resources(such as taggers and lexical databases). Gsearch can be used withgraphical tools for visualizing the results of a query.

[1]  Roberto Zamparelli,et al.  A Theory of Kinds, Partitives and of/z Possessives , 1998 .

[2]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[3]  Mirella Lapata,et al.  Using Subcategorization to Resolve Verb Class Ambiguity , 1999, EMNLP.

[4]  Frank Keller,et al.  Determinants of Adjective-Noun Plausibility , 1999, EACL.

[5]  Jonathan Calder,et al.  How to Build a (Quite General) Linguistic Diagram Editor , 2002, Diagrammatic Representation and Reasoning.

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Maria Lapata,et al.  Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations , 1999, ACL.

[8]  Martin Corley,et al.  Parsing Modifiers: The Case of Bare NP Adverbs , 2020, Proceedings of the Twenty First Annual Conference of the Cognitive Science Society.

[9]  M. Pickering,et al.  Structural change and reanalysis difficulty in language comprehension , 1999 .

[10]  Sidney Greenbaum,et al.  Comparing English worldwide : the International Corpus of English , 1996 .

[11]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[12]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[13]  Joe Calder Thistle: Diagram Display Engines and Editors , 2000 .

[14]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[15]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[16]  Geoffrey Sampson,et al.  English for the Computer: The SUSANNE Corpus and Analytic Scheme , 1995, Computational Linguistics.

[17]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[18]  Andrei Mikheev,et al.  Automatic Rule Induction for Unknown-Word Guessing , 1997, CL.

[19]  Oliver Christ,et al.  A Modular and Flexible Architecture for an Integrated Corpus Query System , 1994, ArXiv.