“I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook

Code-Mixing is a frequently observed phenomenon in social media content generated by multi-lingual users. The processing of such data for linguistic analysis as well as computational modelling is challenging due to the linguistic complexity resulting from the nature of the mixing as well as the presence of non-standard variations in spellings and grammar, and transliteration. Our analysis shows the extent of Code-Mixing in English-Hindi data. The classification of Code-Mixed words based on frequency and linguistic typology underline the fact that while there are easily identifiable cases of borrowing and mixing at the two ends, a large majority of the words form a continuum in the middle, emphasizing the need to handle these at different levels for automatic processing of the data.

[1]  Carol Myers-Scotton,et al.  Contact Linguistics: Bilingual encounters and grammatical outcomes , 2013 .

[2]  Donald Winford,et al.  An Introduction to Contact Linguistics , 2003 .

[3]  Fredric Field,et al.  Linguistic Borrowing in Bilingual Contexts , 2002 .

[4]  Pieter Muysken,et al.  Bilingual Speech: A Typology of Code-Mixing , 2000 .

[5]  Francisco Gomes de Matos The handbook of bilingualism and multilingualism , 2013 .

[6]  J. Gumperz Discourse strategies: Introduction , 1982 .

[7]  John C. Paolillo "Conversational" Codeswitching on Usenet and Internet Relay Chat , 2011 .

[8]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[9]  Jeff MacSwan,et al.  Code Switching and Grammatical Theory , 2008 .

[10]  Beatrice Alex,et al.  Automatic detection of English inclusions in mixed-lingual text with an application to parsing , 2008 .

[11]  Pieter Muysken,et al.  Code-switching and grammatical theory , 1995, The Bilingualism Reader.

[12]  Sarah G. Thomason,et al.  Contact as a Source of Language Change , 2008 .

[13]  Neny Isharyanti,et al.  Code-switching and code-mixing in Internet chatting: between 'yes', 'ya', and 'si'-a case study , 2009 .

[14]  David Crystal,et al.  Language and the Internet , 2001 .

[15]  Sanjeev Khudanpur,et al.  Transliteration of Proper Names in Cross-Lingual Information Retrieval , 2003, NER@ACL.

[16]  Penelope Gardner-Chloros Code-switching: Appendix , 2009 .

[17]  Peter Auer,et al.  One speaker, two languages: The pragmatics of code-switching: a sequential approach , 1995 .

[18]  D. Sankoff,et al.  The social correlates and linguistic processes of lexical borrowing and assimilation , 1988 .

[19]  Tirthankar Dasgupta,et al.  Resource Creation for Training and Testing of Transliteration Systems for Indian Languages , 2010, LREC.

[20]  Susan C. Herring,et al.  The Multilingual Internet: Language, Culture, and Communication Online , 2007 .

[21]  Celso Alvarez-Cáccamo,et al.  Rethinking Conversational Code-Switching: Codes, Speech Varieties, and Contextualization , 1990 .

[22]  David Sankoff,et al.  The case of the nonce loan in Tamil , 1990, Language Variation and Change.

[23]  Nathalie Dion,et al.  Myths and facts about loanword development , 2012, Language Variation and Change.

[24]  Stig Eliasson Duelling languages. Grammatical structure in code-switching by Carol Myers-Scotton , 1995 .