This paper describes a hybrid system (FrAG) for tagging / parsing French text, and presents results from ongoing development work, corpus annotation and evaluation. The core of the system is a sentence scope Constraint Grammar (CG), with linguist-written rules. However, unlike traditional CG, the system uses hybrid techniques on both its morphological input side and its syntactic output side. Thus, FrAG draws on a pre-existing probabilistic Decision Tree Tagger (DTT) before and in parallel with its own lexical stage, and feeds its output into a Phrase Structure Grammar (PSG) that uses CG syntactic function tags rather than ordinary terminals in its rewriting rules. As an alternative architecture, dependency tree structures are also supported. In the newest version, dependencies are assigned within the CG-framework itself, and can interact with other rules. To provide semantic context, a semantic prototype ontology for nouns is used, covering a large part of the lexicon. In a recent test run on Parliamentary debate transcripts, FrAG achieved F-scores of 98.7 % for part of speech (PoS) and between 93.1 % and 96.2 % for syntactic function tags. Dependency links were correct in 95.9 %.
[1]
Frank Keller,et al.
Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French
,
2005,
ACL.
[2]
Helmut Schmidt,et al.
Probabilistic part-of-speech tagging using decision trees
,
1994
.
[3]
Eckhard Bick.
Turning Constraint Grammar Data into Running Dependency Treebanks
,
2005
.
[4]
Djamé Seddah,et al.
On Statistical Parsing of French with Supervised and Semi-Supervised Strategies
,
2009
.
[5]
Laurent Romary,et al.
La FREEBANK : vers une base libre de corpus annotés
,
2004,
JEPTALNRECITAL.
[6]
Jean-Pierre Chanod,et al.
Tagging French - comparing a statistical and a constraint-based method
,
1995,
EACL.
[7]
Josef van Genabith,et al.
Treebank-Based Acquisition of LFG Parsing Resources for French
,
2008,
LREC.
[8]
E. Bick.
PSG hybrid approach to automatic corpus annotation
,
2003
.