Building a Treebank for French

We present a treebank project for French. We have annotated a newspaper corpus of 1 Million words with part of speech, inflection, compounds, lemmas and constituency. We describe the tagging and parsing phases of the project, and for each, the automatic tools, the guidelines and the validation process. We then present some uses of the corpus as well as some directions for future work.

[1]  Nabil Hathout,et al.  Automatic construction and validation of French large lexical resources. Reuse of verb theoretical linguistic descriptions , 1998, LREC.

[2]  Fiammetta Namer FLEMM : Un analyseur flexionnel du français à base de règles , 2000 .

[3]  Alexandra Kinyon A Language-Independent Shallow-Parser Compiler , 2001, ACL.

[4]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[5]  Marie-Hélène Candito,et al.  A principle-based hierarchical representation of LTAGs , 1996, COLING.

[6]  R. Borsley The Nature and Function of Syntactic Categories , 1999 .

[7]  Anne Abeillé,et al.  TALANA Aanotated corpus: first results , 1998 .

[8]  Nancy Priest-Dorman Greg Ide,et al.  Corpus Encoding Standard (CES) , 2000 .

[9]  Geoffrey Sampson English for the computer , 1995 .

[10]  Richard S. Kayne,et al.  French Syntax: The Transformational Cycle , 1975 .

[11]  Monica Monachini ELM-IT: EAGLES Specifications for Italian morphosyntax Lexicon Specification and Classification Guidelines , 1996 .

[12]  Christiane Laeufer,et al.  Le Bon Usage , 1986 .

[13]  Robert C. Berwick,et al.  Principle-Based Parsing , 1987 .

[14]  Emmanuel Giguet Methode pour l'analyse automatique de structures formelles sur documents multilingues , 1998 .

[15]  Michel Simard,et al.  Merging example-based and statistical machine translation: an experiment , 2002, AMTA.

[16]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[17]  William A. Gale,et al.  Tagging French Without Lexical Probabilities - Combining Linguistic Knowledge And Statistical Learning , 1997, ArXiv.

[18]  Petr Sgall,et al.  Language resources need annotations to make them really reusable: the Prague dependency tree bank , 1998, LREC.

[19]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[20]  Eric Brill,et al.  A corpus-based approach to language learning , 1993 .

[21]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[22]  J. Kimball Seven principles of surface structure parsing in natural language , 1973 .

[23]  Lionel Clément Construction et exploitation d'un corpus syntaxiquement annoté pour le français , 2001 .

[24]  Max Silberztein,et al.  Dictionnaires électroniques et analyse automatique de textes : le système intex , 1993 .

[25]  Jean-Pierre Chanod,et al.  Incremental Finite-State Parsing , 1997, ANLP.

[26]  Anne Abeillé,et al.  FTAG : current status and parsing scheme , 1999 .

[27]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[28]  A. Abeillé,et al.  Une grammaire lexicalisée d'arbres adjoints pour le français : application à l'analyse automatique , 1991 .

[29]  Anne Abeillé,et al.  French Word Order And Lexical Weight , 1999 .