Genre studies in Setswana are rare, particularly those which study genre on the basis of lexical distinctiveness. Such studies have however been attempted in other languages (cf. Stamatatos et al., 2000; Xiao and McEnery, 2005). The paper proposes a genre analysis of the sports domain of Setswana. It implements a computational and statistical methodology of retrieving sports terms from a Setswana sub-corpus of a large Setswana corpus. The proposed approach is preferred since it is unbiased and measures real language as used by speakers of the language. It uses frequency and keyword analysis to generate words which are typical and definitive of the sports genre. The strategy has been used before extensively in the study of a variety of Setswana genres (see Otlogetswe, 2007). Finally, the article demonstrates that such retrieved texts could be useful in a variety of applications, amongst these being, genre studies, lexicography and other data retrieval applications.
[1]
Della Summers.
LEXICOGRAPHY-The importance of representativeness in relation to frequency
,
2022
.
[2]
Adam Kilgarriff,et al.
Putting frequencies in the dictionary
,
1997
.
[3]
Tony Berber Sardinha.
Comparing corpora with WordSmith Tools: How large must the reference corpus be?
,
2000
.
[4]
Efstathios Stamatatos,et al.
Text Genre Detection Using Common Word Frequencies
,
2000,
COLING.
[5]
Adam Kilgarriff,et al.
Corpus Similarity and Homogeneity via Word Frequency
,
1996
.
[6]
Anthony McEnery,et al.
Two Approaches to Genre Analysis
,
2005
.
[7]
Marco Baroni,et al.
39 Distributions in Text
,
2005
.
[8]
Pascual Cantos Gómez.
Do we need statistics when we have linguistics
,
2002
.
[9]
Paul Rayson,et al.
Extending the Cochran rule for the comparison of word frequencies between corpora
,
2004
.