What Corpus Linguistics can offer Contact Linguistics: the C-ORAL-BRASIL corpus experience

Contact Linguistics, throughout its history, has been mostly a data-oriented subdiscipline. From the gathering of word lists in colonial settings by pioneer scholars to the current compilation of narratives, interviews and databanks, Contact Linguistics, differently from other Linguistics subdisciplines, has strived to base its findings on the analysis of actual language produced by speakers of a given language, and not on solely introspective methodologies. On the other hand, Corpus Linguistics has brought innovative methodological approaches to mostly every subfield in linguistic research. Corpora compilation parameters have taken representativeness and balance seriously and this, in turn, has aided the finding of generalizations about linguistic systems. In this paper, the C-ORAL-BRASIL corpus, a spontaneous speech corpus of informal Brazilian Portuguese is presented, and some of its characteristics that may be explored very profitably by a Contact Linguistics perspective are highlighted.

[1]  Donald Winford,et al.  An Introduction to Contact Linguistics , 2003 .

[2]  Mari D'Agostino,et al.  Sociolinguistica dell'italiano contemporaneo , 2007 .

[3]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[4]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[5]  Heming Yong,et al.  C-oRAl-RoM Integrated Reference Corpora for Spoken Romance Languages , 2009 .

[6]  Susan Conrad,et al.  Register Variation: A Corpus Approach , 2005 .

[7]  Heliana Mello,et al.  Para a transcriçao da fala espontânea: o caso do C-ORAL-BRASIL , 2009 .

[8]  N. Hornberger Multilingual language policies and the continua of biliteracy: An ecological approach , 2002 .

[9]  Emanuela Cresti,et al.  Corpus di italiano parlato , 2000 .

[10]  Susan Conrad,et al.  Corpus Linguistics: Investigating Language Structure and Use , 1998 .

[11]  Maryualê Malvessi Mittmann,et al.  Validação estatística dos critérios de segmentação da fala espontânea no corpus C-ORAL-BRASIL , 2009 .

[12]  F. Gadet Niveaux de langue et variation intrinsèque , 1996 .

[13]  Pura Guil,et al.  Lessico di frequenza dell´italiano parlato. Milano, Etaslibri, 1993. , 1995 .

[14]  R. Hickey The Handbook of Language Contact , 2010 .

[15]  B. Webber The Handbook of Discourse Analysis , 2005 .

[16]  Manuel Alcántara Plá,et al.  C-ORAL-ROM. Integrated reference corpora for spoken romance languages , 2003 .

[17]  Massimo Moneglia,et al.  Spoken corpora and pragmatics , 2011 .

[18]  Iris Hendrickx,et al.  A Corpus of Santome , 2012 .

[19]  M. Avanzi,et al.  Macro-syntaxe et pragmatique. L’analyse linguistique de l’oral.. Actes du colloque international de Florence, 23-24 avril 1999, 2003. Scarano Antonietta (Dir.), Rome, Bulzoni editore, 358 p. , 2005 .

[20]  Svenja Kranich,et al.  Language Contact , 2020, The Dutch Language in Japan (1600-1900).

[21]  Kepa Sarasola,et al.  Language Technology for Normalisation of Less-Resourced Languages , 2012, LREC 2012.

[22]  Benedikt Szmrecsanyi,et al.  Measuring analyticity and syntheticity in creoles , 2014 .

[23]  F. Gadet,et al.  La variation sociale en français , 2007 .

[24]  Sarah G. Thomason,et al.  Language Contact: An Introduction , 2001 .

[25]  J. Vizmuller-Zocco,et al.  Sintassi e intonazione nell'italiano parlato , 1994 .

[26]  A. Grant Languages in contact: The partial restructuring of vernaculars , 2006, Language in Society.

[27]  Challenging the perceptual relevance of prosodic breaks in multilingual spontaneous speech corpora : CORAL-BRASIL / CORAL-ROM , 2010 .