Corpus design for Setswana lexicography

[1]  Lars Borin A corpus of written Finnish Romani texts , 2000 .

[2]  David Malvern,et al.  Investigating accommodation in language proficiency interviews using a new measure of lexical diversity , 2002 .

[3]  Serge Verlinde,et al.  Corpus-based vs intuition-based lexicography. Defining a word list for a French learner's dictionary , 2001 .

[4]  Adam Kilgarriff,et al.  Which words are particularly characteristic of a text? a survey of statistical approaches , 1996 .

[5]  Anthony McEnery,et al.  Two Approaches to Genre Analysis , 2005 .

[6]  Tamás Váradi The linguistic relevance of corpus linguistics , 2001 .

[7]  Tony Berber Sardinha Comparação de corpora com Wordsmith keywords , 2001 .

[8]  Tom McArthur,et al.  Worlds of reference : lexicography, learning, and language from the clay tablet to the computer , 1986 .

[9]  G. Youmans,et al.  Measuring Lexical Style and Competence: The Type-Token Vocabulary Curve , 1990 .

[10]  Marco Baroni,et al.  39 Distributions in Text , 2005 .

[11]  A. E. Arua,et al.  Patterns of Language Use and Language Preference of some Children and their Parents in Botswana , 2002 .

[12]  Lou Burnard,et al.  Where did we Go Wrong? A Retrospective Look at the British National Corpus , 2002 .

[13]  Branimir Boguraev,et al.  Review of Looking up: an account of the COBUILD project in lexical computing by John M. Sinclair. Collins ELT 1987. , 1990 .

[14]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[15]  H. Alidou Medium of Instruction in Post-Colonial Africa , 2003 .

[16]  Serge Sharoff,et al.  Methods and tools for development of the Russian Reference Corpus , 2006 .

[17]  Nelleke Oostdijk A Corpus Linguistic Approach to Linguistic Variation , 1988 .

[18]  George R. Doddington CSR Corpus Development , 1992, HLT.

[19]  Ralph Grishman,et al.  Dictionaries and balanced corpora: the interdependence of resources , 1998 .

[20]  Rayid Ghani,et al.  Automatic web search query generation to create minority language corpora , 2001, SIGIR '01.

[21]  Michael Oakes,et al.  Statistics for Corpus Linguistics , 1998 .

[22]  Vito Pirrelli,et al.  Corpora and computational lexica: integration of different methodologies of lexical knowledge acquisition , 1994 .

[23]  D. Livingstone,et al.  Missionary Travels and Researches in South Africa , 1857 .

[24]  I. M. de Mönnink Combining Corpus and Experimental Data , 1999 .

[25]  A. R. Mustapha,et al.  Ethnic Structure, Inequality and Governance of the Public Sector in Nigeria. , 2005 .

[26]  Kevin P. Scannell The Crúbadán Project: Corpus building for under-resourced languages , 2007 .

[27]  G. Murdock,et al.  Outline of Cultural Materials , 1983 .

[28]  Ralph Grishman,et al.  The American National Corpus: A Standardized Resource for American English , 2000, LREC.

[29]  Ronald Moe,et al.  Compiling dictionaries using semantic domains , 2010 .

[30]  Rosie Jones,et al.  Automatically Building a Corpus for a Minority Language from the Web , 2000, ACL 2000.

[31]  W. Joss,et al.  W y, , 2022 .

[32]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[33]  Ronald A. Wells,et al.  Dictionaries and the Authoritarian Tradition: Study in English Usage and Lexicography , 1973 .

[34]  Gilles-Maurice de Schryver,et al.  Taking Dictionaries for Bantu Languages into the New Millennium - with special reference to Kiswahili, Sepedi and isiZulu. , 2000 .

[35]  Guy Aston,et al.  The BNC Handbook: Exploring the British National Corpus with SARA , 1998 .

[36]  A. Kilgarriff Comparing Corpora , 2001 .

[37]  Adam Kilgarriff,et al.  Lexical profiling software and its lexicographic applications: a case study , 2002 .

[38]  R. Pagano Understanding Statistics in the Behavioral Sciences , 1981 .

[39]  D. Biber Methodological Issues Regarding Corpus-based Analyses of Linguistic Variation , 1990 .

[40]  Adam Kilgarriff,et al.  Corpus Similarity and Homogeneity via Word Frequency , 1996 .

[41]  G. Leech,et al.  Word Frequencies in Written and Spoken English: based on the British National Corpus , 2001 .

[42]  Susan Conrad,et al.  Corpus linguistics: Index , 1998 .

[43]  Gilles-Maurice de Schryver,et al.  On how electronic dictionaries are really used , 2004 .

[44]  Some lexical features of immersion pupils’ oral and written narration , 2009 .

[45]  Steven J. DeRose An analysis of probabilistic grammatical tagging methods , 1991 .

[46]  Graeme D. Kennedy,et al.  Book Reviews: An Introduction to Corpus Linguistics , 1999, CL.

[47]  Henri Béjoint,et al.  Modern Lexicography: An Introduction , 1994 .

[48]  James Archbeli A grammar of the Bechuana language , 1837 .

[49]  Stig Johansson,et al.  English computer corpora : selected papers and research guide , 1991 .

[50]  Thapelo J. Otlogetswe THE BNC DESIGN AS A MODEL FOR A SETSWANA LANGUAGE CORPUS , 2003 .

[51]  František Čermák,et al.  Czech National Corpus: A Case in Many Contexts , 1997 .

[52]  Richard Kittredge,et al.  Sublanguage : studies of language in restricted semantic domains , 1982 .

[53]  Marco Baroni,et al.  Building general- and special-purpose corpora by Web crawling , 2006 .

[54]  Stuart James,et al.  Routledge Dictionary of Language and Linguistics , 1999 .

[55]  Udo Fries,et al.  Creating and Using English Language Corpora , 1994 .

[56]  Pieter de Haan The optimum corpus sample size , 1992 .

[57]  Willem Meijs,et al.  Book Reviews: Theory and Practice in Corpus Linguistics , 1991, CL.

[58]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[59]  Gilles-Maurice de Schryver,et al.  Electronic corpora as a basis for the compilation of African-language dictionaries, Part 1: The macrostructure , 2000 .

[60]  Frantisek Lichtenberk To List or Not to List: Writing a Dictionary of a Language Undergoing Rapid and Extensive Lexical Changes , 2003 .

[61]  R. A. Gardiner,et al.  Republic of Botswana , 1971 .

[62]  Geoffrey Finch,et al.  Linguistic terms and concepts , 1999 .

[63]  Jonathan Culpeper Computers, language and characterisation : an analysis of six characters in Romeo and Juliet. , 2002 .

[64]  Gregory F. Roberts,et al.  The home page as genre: a narrative approach , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[65]  Arthur Dorros,et al.  My House , 1992 .

[66]  Niladri Sekhar Dash The process of designing a multidisciplinary monolingual sample corpus , 2000 .

[67]  Paul Rayson,et al.  Extending the Cochran rule for the comparison of word frequencies between corpora , 2004 .

[68]  G. Wisker The postgraduate research handbook , 2001 .

[69]  Frank Keller,et al.  Using the Web to Overcome Data Sparseness , 2002, EMNLP.

[70]  Pasi Tapanainen,et al.  What is a word, What is a sentence? Problems of Tokenization , 1994 .

[71]  Eugene A. Nida,et al.  Greek-English Lexicon of the New Testament Based on Semantic Domains , 1989 .

[72]  J. Léon Claimed and Unclaimed Sources of Corpus Linguistics , 2005 .

[73]  Sidney I. Landau,et al.  Dictionaries: The Art and Craft of Lexicography , 1985 .

[74]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[75]  D. J. Gilles-Maurice Prinsloo,et al.  Monitoring the Stability of a Growing Organic Corpus, with special reference to Sepedi and Xitsonga , 2001 .

[76]  Geoffrey Leech,et al.  English Grammar for Today: A New Introduction , 1982 .

[77]  Ron Larson,et al.  Elementary Statistics: Picturing the World , 1999 .

[78]  Sidney I. Landau Dictionaries: The Art and Craft of Lexicography , 1985 .

[79]  Serge Sharo Creating General-Purpose Corpora Using Automated Search Engine Queries , 2006 .

[80]  David Malvern,et al.  Measuring vocabulary diversity using dedicated software , 2000 .

[81]  Geoffrey Leech,et al.  Grammatical word class variation within the British National Corpus sampler , 2002 .

[82]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[83]  Dutch ROGET'S THESAURUS , 1979 .

[84]  Betty Kirkpatrick,et al.  Roget's Thesaurus , 1852 .

[85]  E. Thorndike The Teacher's Word Book , 2007 .

[86]  Gregory Grefenstette,et al.  Web as Corpus , 2003 .

[87]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[88]  Gilles-Maurice de Schryver,et al.  Electronic corpora as a basis for the compilation of African-language dictionaries, Part 2: The microstructure , 2000 .

[89]  O. Jespersen A modern English grammar on historical principles , 1928 .

[90]  Vincent B. Y. Ooi Computer Corpus Lexicography , 1998 .

[91]  Anthony Paul Cowie English Dictionaries for Foreign Learners: A History , 2000 .

[92]  X YingGuoPeiShengJiaoYuChuBanYou Longman Dictionary of Contemporary English , 1991 .

[93]  K. Demuth,et al.  Interaction between Discourse Functions and Agreement in Setawana , 1989 .

[94]  Manuel Montes-y-Gómez,et al.  A Corpus Balancing Method for Language Model Construction , 2003, CICLing.

[95]  Douglas Biber,et al.  Using Register-Diversified Corpora for General Language Studies , 1993, Comput. Linguistics.

[96]  Douglas Douglas,et al.  The multi-dimensional approach to linguistic analyses of genre variation: An overview of methodology and findings , 1992, Comput. Humanit..

[97]  Mamokgethi Setati,et al.  Incomplete Journeys: Code-switching and Other Language Practices in Mathematics, Science and English Language Classrooms in South Africa , 2002 .

[98]  R. Burchfield,et al.  Studies in lexicography , 1990 .

[99]  Michael A. Shepherd,et al.  The evolution of cybergenres , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[100]  Gabriela Maria Chiara Cavaglia Measuring the homogeneity and similarity of language corpora , 2005 .

[101]  Ulrich Heid,et al.  Relating Lexicon and Corpus: Computational Support for Corpus-Based Lexicon Building in DELIS , 1994 .

[102]  A. Zwicky,et al.  Chapter 9. Register as a Dimension of Linguistic Variation , 1982 .

[103]  J. Butler Excitable Speech. A Politics of the Performative , 1997 .

[104]  Aquilino Sánchez,et al.  Predictability of word forms (types) and lemmas in linguistic corpora. A Case Study Based on the Analysis of the CUMBRE Corpus:: an 8-million-word Corpus of contemporary Spanish , 1997 .

[105]  Paul Edward Rayson,et al.  Matrix : a statistical method and software tool for linguistic analysis through corpus comparison , 2003 .

[106]  Christopher S. Butler,et al.  Collocational frameworks in Spanish , 1998 .

[107]  S. Johansson,et al.  Word Frequencies in British and American English , 1985 .

[108]  Marco Baroni,et al.  Testing the extrapolation quality of word frequency models , 2006 .

[109]  Xavier Blanco,et al.  Multi-Lexemic Expressions: an overview , 2004 .

[110]  A. Sandilands Introduction to Tswana , 1953 .

[111]  Jean Aitchison Teach Yourself Linguistics , 1987 .

[112]  John Algeo,et al.  British and American Grammatical Differences , 1988 .

[113]  Judy Pearshall,et al.  The new Oxford dictionary of English. , 2000 .

[114]  Della Summers LEXICOGRAPHY-The importance of representativeness in relation to frequency , 2022 .

[115]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[116]  Kenneth Ward Church,et al.  Introduction to the Special Issue on Computational Linguistics Using Large Corpora , 1993, Comput. Linguistics.

[117]  Adam Kilgarriff,et al.  BUSINESS MODELS FOR DICTIONARIES AND NLP , 2000 .

[118]  M. Bagwasi The Functional Distribution of Setswana and English in Botswana , 2003 .

[119]  Martin Volk,et al.  Using the web as corpus for linguistic research , 2002 .

[120]  Samuel Johnson,et al.  Johnson's Dictionary: A Modern Selection , 1974 .

[121]  G. Leech,et al.  Social differentiation in the use of English vocabulary: some analyses of the conversational component of the British National Corpus , 1997 .

[122]  Pascual Cantos Gómez Do we need statistics when we have linguistics , 2002 .

[123]  Paul Rayson,et al.  Automatic Content Analysis of Spoken Discourse , 1992 .

[124]  Thapelo J. Otlogetswe Challenges to Issues of Balance and Representativeness in African Lexicography , 2010 .

[125]  A. P. Berber Sardinha Beginning Portuguese corpus linguistics: exploring a corpus to teach Portuguese as a foreign language , 1999 .

[126]  Tomaz Erjavec,et al.  The IJS-ELAN Slovene-English Parallel Corpus , 2002 .

[127]  C. Section Central Statistics Office - Releases and Publication , 2005 .

[128]  William H. Fletcher Making the Web More Useful as a Source for Linguistic Corpora , 2004 .

[129]  Herman Wekker,et al.  Topics in English linguistics , 1990 .

[130]  S. Johansson,et al.  Frequency analysis of English vocabulary and grammar : based on the LOB Corpus , 1989 .

[131]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[132]  Pierre Hubert,et al.  A Model of Vocabulary Partition , 1988 .

[133]  Mick Short,et al.  Using Corpora for Language Research , 1998 .

[134]  Rufus H. Gouws,et al.  Principles and practice of South African lexicography , 2010 .

[135]  Paul Rayson,et al.  Comparing Corpora using Frequency Profiling , 2000, Proceedings of the workshop on Comparing corpora -.

[136]  Norma A. Pravec Survey of learner corpora , 2002 .

[137]  William H. Fletcher Concordancing the web: promise and problems, tools and techniques , 2007 .

[138]  Steve Crowdy,et al.  Spoken corpus transcription , 1994 .

[139]  R. Harald Baayen,et al.  Word Frequency Distributions , 2001 .

[140]  Göran Kjellmer,et al.  A dictionary of English collocations : based on the Brown corpus , 1994 .

[141]  Martin Wynne,et al.  Developing Linguistic Corpora: a Guide to Good Practice , 2005 .

[142]  Danie J. Prinsloo Revising Matumo's Setswana-English-Setswana Dictionary , 2010 .

[143]  Shana Poplack The care and handling of a mega-corpus: The Ottawa-Hull French project , 1989 .

[144]  Jens Allwood,et al.  Some Frequency based Differences between Spoken and Written Swedish , 1998 .

[145]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[146]  Emanuele Pianta,et al.  The MEANING Italian Corpus , 2003 .

[147]  David Y. W. Lee,et al.  Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle , 2001 .

[148]  R. Moon Sinclair, lexicography, and the Cobuild Project: The application of theory , 2007 .

[149]  Gilles-Maurice de Schryver Web for/as corpus: a perspective for the African languages , 2002 .

[150]  Rayid Ghani,et al.  Mining the web to create minority language corpora , 2001, CIKM '01.

[151]  Nancy Ide,et al.  An American national corpus: a proposal , 1998, LREC.

[152]  Paul Rayson,et al.  Corpus linguistics around the world , 2006 .

[153]  Dimane Mpoeleng,et al.  Human-computer interface design issues for a multi-cultural and multi-lingual English speaking country Botswana , 2001, Interact. Comput..

[154]  Rufus H. Gouws,et al.  Formulating a new dictionary convention for the lemmatization of verbs in Northern Sotho , 1996 .

[155]  Laurent Besacier,et al.  Using the web for fast language model construction in minority languages , 2003, INTERSPEECH.

[156]  Aquilino Sánchez,et al.  Lexical Constellations: what collocates fail to tell , 2001 .

[157]  H. Lichtenstein,et al.  Travels in Southern Africa in the Years, 1803, 1804, 1805 and 1806 , 2010 .

[158]  C. Chapelle The Computational Analysis of English—A Corpus‐Based Approach , 1988 .

[159]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[160]  Nicholas Ostler,et al.  Corpus Design Criteria , 1992 .

[161]  John J. Kovarik How Should a Large Corpus Be Built?-A Comparative Study of Closure in Annotated Newspaper Corpora from Two Chinese Sources, Towards Building a Larger Representative Corpus Merged from Representative Sublanguage Collections , 2000, ACL 2000.

[162]  Steve Crowdy Spoken Corpus Design , 1993 .

[163]  L. Zgusta Manual of Lexicography , 1971 .

[164]  Wolfgang Teubert,et al.  Corpus Linguistics and Lexicography , 2001 .

[165]  Della Summers,et al.  Longman Dictionary of Contemporary English , 1995 .

[166]  Ralph Grishman,et al.  The influence of corpora on lexicons: corpora use in the creation of COMLEX syntax and NOMLEX , 2000 .

[167]  J. Campbell Travels in South Africa, undertaken at the request of the London Missionary Society : being a narrative of a second journey in the interior of that country , 1822 .

[168]  Grace Song,et al.  Making Sense of Corpus Data: A Case Study of Verbs of Sound , 1996 .

[169]  Norman Thomson How to Read Articles which Depend on Statistics , 1989 .

[170]  A. Diaz De Ilarraza HIZKING21: Integrating language engineering resources and tools into systems with linguistic capa , 2003 .

[171]  Mark Aronoff,et al.  Contemporary linguistics: An introduction , 1989 .

[172]  Adam Kilgarriff,et al.  WORD SKETCH: Extraction and Display of Signicant Collocations for Lexicography , 2000 .

[173]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[174]  Adam Kilgarriff,et al.  Putting frequencies in the dictionary , 1997 .

[175]  G. Aston Text Categories and Corpus Users: A Response to David Lee. , 2001 .

[176]  Adam Kilgarriff,et al.  The Sketch Engine , 2004 .

[177]  Marco Baroni,et al.  37. Distributions in text , 2009 .

[178]  Jack B. Martin,et al.  Practical and Ethical Issues in Lexicography: Examples from the Creek Dictionary Project , 1996 .

[179]  Emanuele Pianta,et al.  Detecting hidden multiwords in bilingual dictionaries , 2002 .

[180]  Robert Sigley,et al.  Text categories and where you can stick them : A crude formality index , 1997 .

[181]  Geoffrey Barnbrook Language and Computers: A Practical Introduction to the Computer Analysis of Language , 1996 .

[182]  Christian Mair Problems in the compilation of a corpus of standard Caribbean English: A pilot study , 1992 .

[183]  Chris Brew,et al.  Word-Pair Extraction for Lexicography , 1996 .

[184]  R. Alston The English dictionary , 1966 .

[185]  Vasileios Hatzivassiloglou,et al.  Do we Need Linguistics When We Have Statistics? A Comparative Analysis of the Contributions of Linguistic Cues to a Statistical Word Grouping System , 1994 .

[186]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[187]  J. Arthur "There Must Be Something Undiscovered Which Prevents Us from Doing Our Work Well": Botswana Primary Teachers' Views on Educational Language Policy. , 1997 .

[188]  C. Meyer English Corpus Linguistics An Introduction , 2002 .

[189]  A. Wierzbicka Lexicography and conceptual analysis , 1984 .

[190]  Barbara F. Grimes Ethnologue Languages of the World , 1988 .

[191]  Richard Xiao,et al.  Corpus Creation , 2010, Handbook of Natural Language Processing.

[192]  Andrew Harley,et al.  The Role of Corpora in Compiling the Cambridge International Dictionary of English , 1996 .

[193]  Nancy Ide,et al.  The American National Corpus: More Than the Web Can Provide , 2002, LREC.

[194]  E. W. Smith Kinship Terminology of the South African Bantu . By N. J. v. Warmelo. Pp. 119. The Government Printer, Pretoria. , 1933, Africa.

[195]  Adam Kilgarriff,et al.  Using Word Frequency Lists to Measure Corpus Homogeneity and Similarity between Corpora , 1997, VLC.

[196]  Zuraidah Mohd Don,et al.  The notion of a “lemma”: Headwords, roots and lexical sets , 2004 .

[197]  Tony Berber Sardinha USING KEY WORDS IN TEXT ANALYSIS: PRACTICAL ASPECTS , 1999 .

[198]  Gary F. Simons In search of task-centered software: building single-purpose tools from multipurpose components , 1998 .

[199]  Evadne Adrain-Vallance Longman essential activator , 2006 .