aConCorde: Towards an open-source, extendable concordancer for Arabic

There is, currently, a surge of activity surrounding Arabic corpus linguistics. As the number of available Arabic corpora continues to grow, there is an increasing need for robust tools that can process this data, whether for research or teaching. One such tool that is useful for both of these purposes is the concordancer – a simple tool for displaying a specified target word in its context. However, obtaining one that can cope reliably with the Arabic language had proved difficult. Also, there was a desire to add some novel features to the standard concordancer to enhance its usefulness within the classroom – easy-to-use root- and stem-based concordance and integration with corpus clustering algorithms are two examples. Therefore, aConCorde was created to provide such a tool to the community.

[1]  John Hughes,et al.  Automatically Acquiring a Classification of Words , 1994 .

[2]  Nick Chater,et al.  BOOTSTRAPPING SYNTACTIC CATEGORIES , 1992 .

[3]  Eric Atwell,et al.  Pattern Recognition Applied to the Acquisition of a Grammatical Classification System From Unrestricted English Text , 1987, EACL.

[4]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[5]  Eric Atwell,et al.  The design of a corpus of Contemporary Arabic , 2006 .

[6]  K. Shereen APT:an automatic Arabic part-of-speech tagger , 2003 .

[7]  William C. Ogden,et al.  Oleada: User-Centered TIPSTER Technology for Language Instruction , 1996, TIPSTER.

[8]  Eric Atwell,et al.  aConCorde : towards a proper concordance for Arabic , 2005 .

[9]  Anne Wichmann,et al.  Teaching and Language Corpora , 1997 .

[10]  Andrew Roberts,et al.  Automatic Acquisition of Word Classification Using Distribution Analysis of Content Words with Respect to Function Words , 2002 .

[11]  Branimir Boguraev,et al.  Review of Looking up: an account of the COBUILD project in lexical computing by John M. Sinclair. Collins ELT 1987. , 1990 .

[12]  Eric Atwell,et al.  The Automated Evaluation of Inferred Word Classifications , 1994, ECAI.

[13]  Uri Zernik,et al.  Lexical acquisition: Exploiting on-line resources to build a lexicon. , 1991 .

[14]  Guy Aston,et al.  The BNC Handbook: Exploring the British National Corpus with SARA , 1998 .

[15]  Ishwar K. Sethi,et al.  Clustering of Imperfect Transcripts Using a Novel Similarity Measure , 2001, SIGIR Workshop: Information Retrieval Techniques for Speech Applications.

[16]  Tim Johns,et al.  Perspectives on Pedagogical Grammar: From printout to handout: Grammar and vocabulary teaching in the context of Data-driven Learning , 1994 .

[17]  Lou Burnard,et al.  Xara : an XML aware tool for corpus searching , 2003 .

[18]  Brian Everitt,et al.  Clustering of large data sets , 1983 .

[19]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[20]  John Sinclair,et al.  Looking up : an account of the COBUILD Project in lexical computing and the development of the Collins COBUILD English Language Dictionary , 1987 .

[21]  D. Graddol The Future of English , 2018, The Emergence and Development of English.