The compilation of electronic corpora, with special reference to the African Languages

Compiling and querying electronic corpora has become a sine qua non as an empirical basis for contemporary linguistic research. As a result, around the world, corpus applications now abound in all fields of linguistics. In this article it is argued that, if African linguistics is to take its rightful place in the new millennium, the active compilation, querying and application of corpora should become an absolute priority. The article first presents a comprehensive theoretical conspectus of electronic corpora. This theoretical section is followed by a practical exploration for the African languages. To that end, two very different African-language corpus projects are described in detail. The survey of these two projects, combined to inter-African-language comparisons, are deemed to be sufficient proof of the feasibility of establishing a discipline of corpus linguistics for the African languages at present. (S/ern Af Linguistics & Applied Language Stud: 2000 18(1-4): 89-106)

[1]  Stuart James,et al.  Dictionary of lexicography , 1972 .

[2]  Martin Bryan,et al.  SGML - an authors guide to the Standard Generalized Markup Language , 1988 .

[3]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[4]  Nicholas Ostler,et al.  Corpus Design Criteria , 1992 .

[5]  G. Leech Corpora and theories of linguistic performance , 1992 .

[6]  Della Summers,et al.  Longman/Lancaster English Language Corpus – Criteria and Design , 1993 .

[7]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[8]  Scott Deerwester,et al.  English in computer science : a corpus-based lexical analysis , 1994 .

[9]  Adam Kilgarriff,et al.  Putting frequencies in the dictionary , 1997 .

[10]  On using spoken data in corpus lexicography , 1998 .

[11]  Gilles-Maurice de Schryver,et al.  Beknopt woordenboek Cilubà-Nederlands & Kalombodi-mfùndilu kàà Cilubà (Spellingsgids Cilubà) , 1998 .

[12]  Jon Mills Lexicon Based Critical Tokenisation: An Algorithm , 1998 .

[13]  Graeme D. Kennedy,et al.  Book Reviews: An Introduction to Corpus Linguistics , 1999, CL.

[14]  Brian Kelly,et al.  What Is XML , 1998 .

[15]  Gilles-Maurice de Schryver Cilubà phonetics: proposals for a 'corpus-based phonetics from below'-approach , 1999 .

[16]  Gilles-Maurice de Schryver,et al.  Towards a Sound Lemmatisation Strategy for the Bantu Verb through the Use of Frequency-based Tail Slots - with special reference to Cilubà, Sepedi and Kiswahili. , 2000 .

[17]  Gilles-Maurice de Schryver,et al.  SeDiPro 1.0: first parallel dictionary Sepêdi-English , 2000 .