Building Word Representations for Wolof Using Neural Networks

Because a large portion of population in rural areas in sub Saharan Africa understand only local languages, they do not have access all to content available in the World Wide Web. Most content are available in English, Spanish, French, etc. Content in low-resource languages such as Wolof, which is mostly spoken in Senegal, are scarce. Automatic systems for natural language understanding such as machine translation systems that can transform information from common to low-resource languages would allow people in rural areas to access relevant scientific or health content.

[1]  H. Sichel On a Distribution Law for Word Frequencies , 1975 .

[2]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[3]  Fridolin Wild,et al.  Investigating Unstructured Texts with Latent Semantic Analysis , 2006, GfKl.

[4]  Mame Thierno Cissé,et al.  Mise au point d'une base de données lexicale multifonctionnelle : le dictionnaire unilingue wolof et bilingue wolof-français , 2008 .

[5]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[6]  Gilles-Maurice de Schryver,et al.  Towards English - Swahili Machine Translation , 2011 .

[7]  Laurent Besacier,et al.  English-Amharic Statistical Machine Translation , 2012 .

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Edward Ombui,et al.  InterlinguaPlus Machine Translation Approach for Local Languages: Ekegusii & Swahili , 2014 .

[11]  Cheikh M. Bamba Dione,et al.  LFG parse disambiguation for Wolof , 2014, J. Lang. Model..

[12]  Peter Waiganjo Wagacha,et al.  InterlinguaPlus Machine Translation Approach for Under-Resourced Languages: Ekegusii & Swahili , 2014 .

[13]  Elhadji Mamadou Nguer,et al.  Towards the establishment of a LMF-based Wolof language lexicon (Vers la mise en place d'un lexique basé sur LMF pour la langue wolof) [in French] , 2014, TALAf@TALN.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Sanjeev Arora,et al.  A Latent Variable Model Approach to PMI-based Word Embeddings , 2015, TACL.

[16]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..