A Lexical Database of Portuguese Multiword Expressions

This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.

[1]  Darren Pearce A Comparative Evaluation of Collocation Extraction Techniques , 2002, LREC.

[2]  Mark Steedman,et al.  Proceedings of the Third International Conference on Language Resources and Evaluation, LREC 2002, May 29-31, 2002, Las Palmas, Canary Islands, Spain , 2002 .

[3]  Brigitte Krenn Collocation Mining: Exploiting Corpora for Collocation, Identification and Representation , 2000, KONVENS.

[4]  J. R. Firth,et al.  Studies in Linguistic Analysis. , 1974 .

[5]  Igor Mel’čuk,et al.  Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques IV: Recherches lexico-sémantiques IV , 1999 .

[6]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[7]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[8]  Stefan Evert,et al.  Methods for the Qualitative Evaluation of Lexical Association Measures , 2001, ACL.

[9]  Mireille Bilger Corpus : méthodologie et applications linguistiques , 2000 .

[10]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[11]  Ulrich Heid Towards a corpus-based dictionary of German noun-verb collocations , 1998 .

[12]  Christopher S. Butler,et al.  Collocational frameworks in Spanish , 1998 .

[13]  Igorʹ A. Melʹčuk,et al.  DEC dictionnaire explicatif et combinatoire du français contemporain , 1984 .

[14]  Sussi Olsen,et al.  Towards a Strategy for a Representation of Collocations - Extending the Danish PAROLE-lexicon , 2000, LREC.

[15]  Mona Baker,et al.  Text and technology : in honour of John Sinclair , 1993 .

[16]  C. I. Lewis The Modes of Meaning , 1943 .

[17]  Brigitte Krenn,et al.  CDB - A Database of Lexical Collocations , 2000, LREC.

[18]  Amália Mendes,et al.  An electronic dictionary of collocations for European Portuguese: methodology, results and applications , 2002 .

[19]  Ralph Grishman,et al.  Towards Best Practice for Multiword Expressions in Computational Lexicons , 2002, LREC.

[20]  F. Hausmann,et al.  Un dictionnaire des collocations est-il possible? , 1979 .

[21]  Göran Kjellmer,et al.  A dictionary of English collocations : based on the Brown corpus , 1994 .

[22]  J. M. Cohen,et al.  Mexico City : México , 1965 .

[23]  Jeremy Clear,et al.  From Firth Principles — Computational Tools for the Study of Collocation , 1993 .

[24]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[25]  J. Bahns Lexical collocations: a contrastive view , 1993 .