MULTEXT (Multilingual Text Tools and Corpora) is the largest project funded in the Commission of European Communities Linguistic Research and Engineering Program. The project will contribute to the development of generally usable software tools to manipulate and analyse text corpora and to create multilingual text corpora with structural and linguistic markup. It will attempt to establish conventions for the encoding of such corpora, building on and contributing to the preliminary recommendations of the relevant international and European standardization initiatives. MULTEXT will also work towards establishing a set of guidelines for text software development, which will be widely published in order to enable future development by others. All tools and data developed within the project will be made freely and publicly available.
[1]
Kenneth Ward Church.
A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text
,
1988,
ANLP.
[2]
Kenneth Ward Church,et al.
A Program for Aligning Sentences in Bilingual Corpora
,
1993,
CL.
[3]
Nancy Ide,et al.
Background and context for the development of a Corpus Encoding Standard
,
1993
.
[4]
Kenneth Ward Church.
A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text
,
1988,
ANLP.
[5]
C. M. Sperberg-McQueen,et al.
Guidelines for electronic text encoding and interchange
,
1994
.
[6]
Penelope Sibun,et al.
A Practical Part-of-Speech Tagger
,
1992,
ANLP.
[7]
Susan K. Armstrong,et al.
MULTEXT: Multilingual Text Tools and Corpora
,
1996
.