Le traitement automatique de l’arabe dialectalisé : aspects méthodologiques et algorithmiques

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Le traitement automatique de l’arabe dialectalisé : aspects méthodologiques et algorithmiques Houda Saadane

[1]  John C. Eisele Time reference, tense, and formal aspect in Cairene Arabic , 1990 .

[2]  Revised February Extended finite state models of language , 1997 .

[3]  Khaled Shaalan,et al.  Integrating Rule-Based System with Classification for Arabic Named Entity Recognition , 2012, CICLing.

[4]  Saab Mansour Morphtagger: HMM-based Arabic segmentation for statistical machine translation , 2010, IWSLT.

[5]  Khaled Shaalan,et al.  Arabic Named Entity Recognition from Diverse Text Types , 2008, GoTAL.

[6]  B. Daille Extraction de collocations à partir de textes , 2001, JEPTALNRECITAL.

[7]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[9]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[10]  Saleem Abuleil,et al.  Extracting Names From Arabic Text for Question-Answering Systems , 2004, RIAO.

[11]  Slim Mesfar Analyse morpho-syntaxique automatique et reconnaissance des entités nommées en arabe standard , 2008 .

[12]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[13]  Mélissa Barkat Détermination d'indices acoustiques robustes pour l'identification automatique des parlers arabes , 2000 .

[14]  Heba Elfardy,et al.  AIDA: Automatic Identification and Glossing of Dialectal Arabic , 2012, EAMT.

[15]  Mervat Ibrahim The Arabic Language , 2012 .

[16]  Wingyan Chung,et al.  Web searching in a multilingual world , 2008, CACM.

[17]  Nordine Fourour Nemesis, un système de reconnaissance incrémentielle des entités nommées pour le français , 2002, JEPTALNRECITAL.

[18]  David Crystal,et al.  A dictionary of linguistics and phonetics , 1997 .

[19]  Nizar Habash,et al.  Combination of Arabic Preprocessing Schemes for Statistical Machine Translation , 2006, ACL.

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Wajdi Zaghouani,et al.  RENAR: A Rule-Based Arabic Named Entity Recognition System , 2012, TALIP.

[22]  Mansour M. Alghamdi,et al.  Algorithms for Romanizing Arabic Names , 2004, J. King Saud Univ. Comput. Inf. Sci..

[23]  Mona T. Diab,et al.  Sentence Level Dialect Identification in Arabic , 2013, ACL.

[24]  Romaric Besançon,et al.  LIMA : A Multilingual Framework for Linguistic Analysis and Linguistic Resources Development and Evaluation , 2010, LREC.

[25]  Philippe Ortet,et al.  Distributed Cross-Lingual Information Retrieval , 1998 .

[26]  Joseph Dichy,et al.  L'analyse automatique des mots-outils en arabe , 2009 .

[27]  Mohamed Benrabah,et al.  Langue et pouvoir en Algérie : histoire d'un traumatisme linguistique , 1999 .

[28]  Andy Way,et al.  Multi-Word Expression-Sensitive Word Alignment , 2010 .

[29]  Riadh Ouersighni La conception et la réalisation d'un système d'analyse morpho-syntaxique robuste pour l'arabe : utilisation pour la détection et le diagnostic des fautes d'accord , 2002 .

[30]  Yassine Benajiba,et al.  ANERsys 2.0: Conquering the NER Task for the Arabic Language by Combining the Maximum Entropy with POS-tag Information , 2007, IICAI.

[31]  Yassine Benajiba,et al.  Arabic Named Entity Recognition: A Feature-Driven Study , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  J.-M. Lange,et al.  Modèles statistiques pour l'extraction de lexiques bilingues , 1995 .

[33]  Wajdi Zaghouani Le repérage automatique des entités nommées dans la langue arabe , 2009 .

[34]  Khaladi Mohammad Amin MULTILINGUALISM IN ALGERIA , 2015 .

[35]  Joseph Dichy,et al.  Pour une lexicomatique de l'arabe : l'unité lexicale simple et l'inventaire fini des spécificateurs du domaine du mot , 1997 .

[36]  Vincent Claveau,et al.  Inférence de règles de propagation syntaxique pour l'alignement de mots , 2006, Trait. Autom. des Langues.

[37]  Kemal Oflazer,et al.  A Multidialectal Parallel Corpus of Arabic , 2014, LREC.

[38]  Yaser Al-Onaizan,et al.  Translating Named Entities Using Monolingual and Bilingual Resources , 2002, ACL.

[39]  Nizar Habash,et al.  Foreign Words and the Automatic Processing of Arabic Social Media Text Written in Roman Script , 2014, CodeSwitch@EMNLP.

[40]  Nizar Habash,et al.  A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition , 2014, LREC.

[41]  Kevin Knight,et al.  Translating Names and Technical Terms in Arabic Text , 1998, SEMITIC@COLING.

[42]  Pierre Zweigenbaum,et al.  Identifying bilingual Multi-Word Expressions for Statistical Machine Translation , 2012, LREC.

[43]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[44]  Nizar Habash,et al.  Spoken Arabic Dialect Identification Using Phonotactic Modeling , 2009, SEMITIC@EACL.

[45]  A M Derouault Accentuation automatique de textes par des méthodes probabilistes , 1994 .

[46]  Ana-Maria Barbu,et al.  Simple linguistic methods for improving a word alignment algorithm , 2004 .

[47]  Grzegorz Kondrak Cognates and Word Alignment in Bitexts , 2005, MTSUMMIT.

[48]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[49]  Nizar Habash,et al.  A Conventional Orthography for Algerian Arabic , 2015, ANLP@ACL.

[50]  Alexander H. Waibel,et al.  Improving Named Entity Translation Combining Phonetic and Semantic Similarities , 2004, NAACL.

[51]  Chris Callison-Burch,et al.  Machine Translation of Arabic Dialects , 2012, NAACL.

[52]  Darja Fiser,et al.  Harvesting Multi-Word Expressions from Parallel Corpora , 2008, LREC.

[53]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[54]  E. Broselow The phonology of Egyptian Arabic , 1976 .

[55]  Diana Inkpen,et al.  Identification and Disambiguation of Cognates, False Friends, and Partial Cognates Using Machine Learning Techniques , 2010 .

[56]  Alan S. Prince,et al.  Prosodic morphology and templatic morphology , 1990 .

[57]  Noureddine Guella Emprunts Lexicaux dans des Dialectes Arabes Algériens , 2015 .

[58]  Ryan Cotterell,et al.  An Algerian Arabic-French Code-Switched Corpus , 2014 .

[59]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[60]  D. Tufi,et al.  PARALLEL CORPORA , ALIGNMENT TECHNOLOGIES AND FURTHER PROSPECTS IN MULTILINGUAL RESOURCES AND TECHNOLOGY INFRASTRUCTURE , 2008 .

[61]  Eric Wehrli,et al.  Collocation translation based on sentence alignment and parsing , 2007, JEPTALNRECITAL.

[62]  Cheryl Cydney Zoll,et al.  Parsing Below the Segment in a Constraint-Based Framework , 1998 .

[63]  Mohammed A. Attia An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic Modeling Finite State Networks , 2006, BCS.

[64]  Véronique Hoste,et al.  Language-Independent Bilingual Terminology Extraction from a Multilingual Parallel Corpus , 2009, EACL.

[65]  Slim Mesfar,et al.  Named Entity Recognition for Arabic Using Syntactic Grammars , 2007, NLDB.

[66]  Jiangchuan Liu,et al.  Understanding the Characteristics of Internet Short Video Sharing: YouTube as a Case Study , 2007, ArXiv.

[67]  Eric Laporte,et al.  Mots et niveau lexical , 2000 .

[68]  D. Ann Travis Inflectional affixation in transformational grammar : evidence from the Arabic paradigm , 1979 .

[69]  Georgios Paliouras,et al.  Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems , 2001, ACL.

[70]  Frédéric Béchet,et al.  De l'arabe standard vers l'arabe dialectal : projection de corpus et ressources linguistiques en vue du traitement automatique de l'oral dans les médias tunisiens , 2014, Trait. Autom. des Langues.

[71]  Michel Simard,et al.  Using cognates to align sentences in bilingual corpora , 1993, TMI.

[72]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[73]  Nizar Habash,et al.  Conventional Orthography for Dialectal Arabic , 2012, LREC.

[74]  Ulrich Germann,et al.  Yawat: Yet Another Word Alignment Tool , 2008, ACL.

[75]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[76]  Mona T. Diab,et al.  Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations , 2012, LREC.

[77]  Nasredine Semmar,et al.  Transcription des noms arabes en écriture latine , 2014 .

[78]  Chris Callison-Burch,et al.  Arabic Dialect Identification , 2014, CL.

[79]  Nizar Habash,et al.  A Conventional Orthography for Tunisian Arabic , 2014, LREC.

[80]  Jean-pierre Chanod A Non-deterministic Tokeniser for Finite-State Parsing , 1996 .

[81]  Leah S. Larkey,et al.  Statistical transliteration for english-arabic cross language information retrieval , 2003, CIKM '03.

[82]  Joseph Dichy,et al.  Approche expérimentale de la reconnaissance du mot écrit en arabe , 2003 .

[83]  H. Saadane,et al.  Transcription of Arabic names into Latin , 2012, 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT).

[84]  Christopher D. Manning,et al.  A Phrase-Based Alignment Model for Natural Language Inference , 2008, EMNLP.

[85]  Jeffrey Heath,et al.  Understanding Arabic: Essays in Contemporary Arabic Linguistics in Honor of El-Said Badawi , 1996 .

[86]  Yassine Benajiba,et al.  Arabic Named Entity Recognition using Optimized Feature Sets , 2008, EMNLP.

[87]  Khaled Shaalan,et al.  A Pipeline Arabic Named Entity Recognition using a Hybrid Approach , 2012, COLING.

[88]  Mona T. Diab,et al.  COLABA : Arabic Dialect Annotation and Processing , 2011 .

[89]  Khaoula Taleb-Ibrahimi,et al.  L’Algérie : coexistence et concurrence des langues , 2007 .

[90]  T. Mitchell An introduction to Egyptian colloquial Arabic , 1978 .

[91]  John Maloney,et al.  TAGARAB: A Fast, Accurate Arabic Name Recognizer Using High-Precision Morphological Analysis , 1998, SEMITIC@COLING.

[92]  Ingeborg Blank,et al.  Terminology extraction from parallel technical texts , 2000 .

[93]  Mona T. Diab,et al.  Second Generation AMIRA Tools for Arabic Processing : Fast and Robust Tokenization , POS tagging , and Base Phrase Chunking , 2009 .

[94]  Kemal Oflazer,et al.  Transforming Standard Arabic to Colloquial Arabic , 2012, ACL.

[95]  Long Jiang,et al.  Named Entity Translation with Web Mining and Transliteration , 2007, IJCAI.

[96]  Farid Meziane,et al.  A Rule Based Persons Names Arabic Extraction System , 2009 .

[97]  Young-Suk Lee,et al.  Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[98]  Ramzi Abbes La Conception et la réalisation d'un concordancier électronique pour l'arabe , 2004 .

[99]  Kareem Darwish,et al.  Arabizi Detection and Conversion to Arabic , 2013, ANLP@EMNLP.

[100]  Nizar Habash,et al.  Parsing Arabic Dialects , 2006, EACL.

[101]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[102]  Michael K. Brame Arabic phonology : implications for phonological theory and historical Semitic , 1970 .

[103]  Philippe Ortet,et al.  Multilingualdatabase and crosslingual interrogation in a real internet application , 1997 .

[104]  M. Gaudefroy-Demombynes,et al.  Grammaire de l'arabe classique : morphologie et syntaxe , 1975 .

[105]  Mona T. Diab,et al.  Token Level Identification of Linguistic Code Switching , 2012, COLING.

[106]  Catherine Miller Quelles Voix pour quelles Villes arabes , 2008 .

[107]  N. Boukadida Connaissances phonologiques et morphologiques dérivationnelles et apprentissage de la lecture en arabe (Etude longitudinale) , 2008 .

[108]  Alexandre Allauzen,et al.  Modèles discriminants pour l'alignement mot à mot , 2009 .

[109]  Chris Callison-Burch,et al.  The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content , 2011, ACL.

[110]  Sylwia Ozdowska Appariement bilingue de mots par propagation syntaxique à partir de corpus français/anglais alignés , 2004 .

[111]  Éric Gaussier,et al.  Towards Automatic Extraction of Monolingual and Bilingual Terminology , 1994, COLING.

[112]  Khaled Shaalan,et al.  Person Name Entity Recognition for Arabic , 2007, SEMITIC@ACL.

[113]  Hwee Tou Ng,et al.  Mining New Word Translations from Comparable Corpora , 2004, COLING.

[114]  Mona T. Diab,et al.  Arabic Named Entity Recognition: An SVM-based approach , 2008 .

[115]  Mohamed Embarki,et al.  Les dialectes arabes modernes : état et nouvelles perspectives pour la classification géo-sociologique , 2008 .

[116]  Daniel Marcu,et al.  Cognates Can Improve Statistical Translation Models , 2003, NAACL.

[117]  Duncan Forbes,et al.  Grammar of the arabic language , 2011 .

[118]  Authoul Abdulhay Constitution d'une ressource sémantique arabe à partir d'un corpus multilingue aligné , 2012 .

[119]  Yassine Benajiba,et al.  Arabic Named Entity Recognition using Conditional Random Fields , 2008 .

[120]  Henri Fleisch Traité de philologie arabe , 1990 .

[121]  Tao Tao,et al.  Unsupervised Named Entity Transliteration Using Temporal and Phonetic Correlation , 2006, EMNLP.

[122]  Yassine Benajiba,et al.  ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy , 2009, CICLing.

[123]  Kareem Darwish,et al.  Simplified Feature Set for Arabic Named Entity Recognition , 2010, NEWS@ACL.

[124]  R. L. Trask,et al.  语音学和音系学词典 = A dictionary of phonetics and phonology , 1993 .

[125]  Marwa Magdy,et al.  Integrated Machine Learning Techniques for Arabic Named Entity Recognition , 2010 .

[126]  Fathi Debili,et al.  La langue arabe et l'ordinateur de l'étiquetage gramatical à la voyellation automatique , 2002 .

[127]  Khalil Sima'an,et al.  Smoothing a Lexicon-based POS Tagger for Arabic and Hebrew , 2007, SEMITIC@ACL.

[128]  L. Calvet La guerre des langues : et les politiques linguistiques , 1987 .

[129]  R. Mahadin THE MORPHOPHONEMICS OF THE STANDARD ARABIC TRI-CONSONANTAL VERBS , 1982 .

[130]  Fathi Debili,et al.  Etiquetage grammatical de l’arabe voyelle ou non , 1998, SEMITIC@COLING.

[131]  Fathi Debili,et al.  Voyellation automatique de l'arabe , 1998, SEMITIC@COLING.

[132]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[133]  Timothy Baldwin,et al.  Automatic Detection and Language Identification of Multilingual Documents , 2014, TACL.

[134]  Ted Pedersen,et al.  An Evaluation Exercise for Word Alignment , 2003, ParallelTexts@NAACL-HLT.

[135]  François Yvon,et al.  Joint Segmentation and POS Tagging for Arabic Using a CRF-based Classifier , 2012, LREC.

[136]  Joseph Dichy,et al.  Extraction automatique de fréquences lexicales en arabe et analyse d'un corpus journalistique avec le logiciel AraConc et la base de connaissances DIINAR.1 , 2008 .

[137]  Nasredine Semmar,et al.  Using a Hybrid Word Alignment Approach for Automatic Construction and Updating of Arabic to French Lexicons , 2011 .

[138]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[139]  J. Hansen,et al.  Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[140]  Joseph Dichy,et al.  L'ecriture dans la representation de la langue : la lettre et le mot en arabe , 1990 .

[141]  Hayssam N. Traboulsi,et al.  Arabic named entity extraction: A local grammar-based approach , 2009, IMCSIT.

[142]  Mans Hulden,et al.  Foma: a Finite-State Compiler and Library , 2009, EACL.

[143]  Roger Garside,et al.  An Arabic tagset for the morphosyntactic tagging of Arabic , 2001 .

[144]  Alexander H. Waibel,et al.  Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-Feature Cost Minimization , 2003, NER@ACL.

[145]  Nizar Habash,et al.  Un système de traduction de verbes entre arabe standard et arabe dialectal par analyse morphologique profonde , 2013 .

[146]  Khaled Shaalan,et al.  NERA: Named Entity Recognition for Arabic , 2009, J. Assoc. Inf. Sci. Technol..

[147]  Ambroise Queffélec,et al.  Le français en Algérie : lexique et dynamique des langues , 2002 .

[148]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[149]  Hilary Wise A transformational grammar of spoken Egyptian Arabic , 1975 .

[150]  Grzegorz Kondrak,et al.  Bootstrapping a Stochastic Transducer for Arabic-English Transliteration Extraction , 2007, ACL.

[151]  Yassine Benajiba,et al.  Using Language Independent and Language Specific Features to Enhance Arabic Named Entity Recognition , 2009, Int. Arab J. Inf. Technol..

[152]  Dimitris Christodoulakis,et al.  Decision Trees and NLP: A Case Study in POS Tagging , 2009 .

[153]  G. Schramm An Outline of Classical Arabic Verb Structure , 1962 .

[154]  Dina El Kassas,et al.  Une étude contrastive de l'arabe et du français dans une perspective de génération multilingue , 2005 .

[155]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[156]  D. Cohen,et al.  Études de linguistique sémitique et arabe , 1970 .

[157]  ElSaid M. Badawi Educated Spoken Arabic: A Problem in Teaching Arabic as a Foreign Language , 1985 .

[158]  J. V. Rauff,et al.  Finite State Morphology , 2007 .

[159]  Frederique Bisson Arga Methodes et outils pour l'appariement de textes bilingues , 2001 .

[160]  François Yvon,et al.  Traitement automatique des entités nommées en arabe: détection et traduction [Automatic processing of Arabic named entities: detection and translation] , 2013, TAL.

[161]  Achraf Chalabi MT-Based Transparent Arabization of the Internet TARJIM.COM , 2000, AMTA.

[162]  Kareem Darwish,et al.  Building a Shallow Arabic Morphological Analyser in One Day , 2002, SEMITIC@ACL.