Treebanking in VIT: from Phrase Structure to Dependency Representation

In this chapter we will be dealing with treebanks, existing treebanks and their application fields. We will then describe VIT (Venice Italian Treebank), focussing on the syntactic-semantic features of the treebank that are partly dependent on the adopted tagset, partly on the reference linguistic theory, and, lastly as in every treebank on the chosen language: Italian. By discussing examples taken from treebanks available in other languages, we will show the theoretical and practical differences and motivations that lie behind our approach. In the end, we will discuss the quantitative analysis of the data of our treebank comparing them to other treebanks. In general, we will try to substantiate the claim that treebanking grammars or parsers is dramatically dependent on the chosen treebank; and eventually this process seems to be dependent both on substantial factors such as the adopted linguistic framework for structural description and, ultimately, the described language.

[1]  Rodolfo Delmonte,et al.  Strutture sintattiche dall'analisi computazionale di corpora di italiano , 2004 .

[2]  Rodolfo Delmonte Multilevel linguistic transducers for the representation of spontaneous dialogues: from form to meaning in xml format , 2003 .

[3]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[4]  Giorgio Satta,et al.  Analyzing an Italian Treebank with State-of-the-Art Statistical Parsers , 2004 .

[5]  Roberto Basili,et al.  The Italian Syntactic-Semantic Treebank: Architecture, Annotation, Tools and Evaluation , 2000, COLING 2000.

[6]  Roberto Basili,et al.  Building the Italian Syntactic-Semantic Treebank , 2003 .

[7]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[8]  Rodolfo Delmonte How to Annotate Linguistic Information in FILES and SCAT , 2001 .

[9]  Luminita Chiran,et al.  Elementary trees for syntactic and statistical disambiguation , 2000, TAG+.

[10]  Wolfgang Lezius,et al.  The TIGER language. - A Description Language for Syntax Graphs , 2000 .

[11]  Emanuele Pianta,et al.  Tag Disambiguation in Italian , 1999 .

[12]  Rodolfo Delmonte,et al.  Parsing Italian with a Context-Free Recognizer , 1989 .

[13]  M. Volk,et al.  Bootstrapping Parallel Treebanks , 2004, COLING 2004.

[14]  Khalil Sima'an,et al.  Towards comparing parsers from different linguistic frameworks: An information theoretic approach , 2002 .

[15]  Rodolfo Delmonte,et al.  Verso una annotazione XML di dialoghi spontanei per l'analisi sintattico-semantica , 2000 .

[16]  Ulrik Petersen Querying Both Parallel And Treebank Corpora: Evaluation Of A Corpus Query System , 2006, LREC.

[17]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[18]  Rodolfo Delmonte,et al.  How to Integrate Linguistic information in FILES and generate feedback for grammar errors , 2001, ACL 2001.

[19]  Rodolfo Delmonte FROM SHALLOW PARSING TO FUNCTIONAL STRUCTURE , 2002 .

[20]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[21]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[22]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[23]  R. Delmonte,et al.  Immortal : How to detect misspelled from unknown words : Informatique et linguistique : théories et outils pour le traitement automatique des langues naturelles , 1998 .

[24]  Ann Banfield,et al.  Unspeakable Sentences : Narration and Representation in the Language of Fiction , 1982 .

[25]  Rodolfo Delmonte Parsing spontaneous speech , 2003, INTERSPEECH.

[26]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[27]  Rodolfo Delmonte,et al.  Tecniche di apprendimento applicate al problema del tagging: una prima valutazione per l' Italiano , 2002 .

[28]  Paul R. Kingsbury,et al.  PropBank , SALSA , and FrameNet : How Design Determines Product , 2022 .

[29]  Katrin Erk,et al.  The SALSA Corpus: a German Corpus Resource for Lexical Semantics , 2006, LREC.

[30]  Luigi Rizzi,et al.  Issues in Italian Syntax , 1981 .

[31]  Rodolfo Delmonte Shallow Parsing and Functional Structure in Italian Corpora , 2000, LREC.

[32]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[33]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.