Latin Vallex. A Treebank-based Semantic Valency Lexicon for Latin

Despite a centuries-long tradition in lexicography, Latin lacks state-of-the-art computational lexical resources. This situation is strictly related to the still quite limited amount of linguistically annotated textual data for Latin, which can help the building of new lexical resources by supporting them with empirical evidence. However, projects for creating new language resources for Latin have been launched over the last decade to fill this gap. In this paper, we present Latin Vallex, a valency lexicon for Latin built in mutual connection with the semantic and pragmatic annotation of two Latin treebanks featuring texts of different eras. On the one hand, such a connection between the empirical evidence provided by the treebanks and the lexicon allows to enhance each frame entry in the lexicon with its frequency in real data. On the other hand, each valency-capable word in the treebanks is linked to a frame entry in the lexicon.

[1]  Barbara McGillivray,et al.  The Development of the “Index Thomisticus” Treebank Valency Lexicon , 2009, LaTeCH - SHELT&R@EACL.

[2]  Petr Pajas,et al.  PDT-VALLEX : Creating a Large-coverage Valency Lexicon for Treebank Annotation , 2003 .

[3]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[4]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[5]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[6]  David Bamman,et al.  The Design and Use of a Latin Dependency Treebank , 2006 .

[7]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[8]  Heinz Happ,et al.  Grundfragen einer Dependenz-Grammatik des Lateinischen , 1976 .

[9]  Ted Briscoe,et al.  A Large Subcategorization Lexicon for Natural Language Processing Applications , 2006, LREC.

[10]  Michael Kohl,et al.  Cytoscape: software for visualization and analysis of biological networks. , 2011, Methods in molecular biology.

[11]  Marco Passarotti Language Resources. The State of the Art of Latin and the Index Thomisticus Treebank Project , 2011 .

[12]  Zdenka Uresova The verbal valency in the Prague Dependency Treebank from the annotator ' s point of view , 2005 .

[13]  Louis Delatte,et al.  Dictionnaire fréquentiel et index inverse de la langue latine , 1981 .

[14]  Thierry Poibeau,et al.  LexSchem: a Large Subcategorization Lexicon for French Verbs , 2008, LREC.

[15]  Petr Pajas,et al.  Querying Diverse Treebanks in a Uniform Way , 2010, LREC.

[16]  Marco Carlo Passarotti,et al.  Somewhere between Valency Frames and Synsets. Comparing Latin Vallex and Latin WordNet , 2015 .

[17]  Marco Carlo Passarotti,et al.  Development and perspectives of the Latin morphological analyser LEMLAT , 2004 .

[18]  Petr Sgall,et al.  The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects , 1986 .