Classifiers combination to arabic morphosyntactic disambiguation

Parts of speech tagging forms the important pre-processing step in many of the natural language processing applications like text summarization, question answering and information retrieval system. MorphoSyntactic disambiguation (part of speech tagging) is the process of classifying every word in a given context to its appropriate part of speech. In this paper, we first review all the supervised machine learning approaches that have been used in the part of speech tagging. Then we review all the Arabic works to compare and to confirm our need to develop an accurate and efficient Arabic MorphoSyntactic Disambiguation system. Finally we propose a classifiers combination experimental framework for Arabic part of speech tagger in which three diverse probabilistic classifiers (Hidden Markov, Maximum Entropy and Transformation Based Learning) are combined using many different combination strategies to exploit their advantages

[1]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[2]  Walter Daelemans,et al.  Recent advances in memory-based part-of-speech tagging , 1999 .

[3]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[4]  Wataru Kameyama,et al.  Khmer POS Tagger: A Transformation-based Approach with Hybrid Unknown Word Handling , 2007, International Conference on Semantic Computing (ICSC 2007).

[5]  Yamina Tlili-Guiassa Hybrid Method for Tagging Arabic Text , 2006 .

[6]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[7]  Gilles-Maurice de Schryver,et al.  Data-Driven Part-of-Speech Tagging of Kiswahili , 2006, TSD.

[8]  ABOUT IIT BOMBAY & , 2022 .

[9]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[10]  Jabar H. Yousif,et al.  Arabic part-of-speech tagger based Support Vectors Machines , 2008, 2008 International Symposium on Information Technology.

[11]  Xiao-Long Wang,et al.  A Statistical Based Part of Speech Tagger for Urdu Language , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[12]  Eric Brill,et al.  Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Roger Garside,et al.  An Arabic tagset for the morphosyntactic tagging of Arabic , 2001 .

[15]  Ibrahim A. Al-Kharashi,et al.  Arabic morphological analysis techniques: A comprehensive survey , 2004, J. Assoc. Inf. Sci. Technol..

[16]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[17]  Ahmed,et al.  Application of multilayer perceptron network for tagging parts-of-speech , 2002, Language Engineering Conference, 2002. Proceedings.

[18]  Michele Banko,et al.  Part-of-Speech Tagging in Context , 2004, COLING.

[19]  R.C. Hernandez,et al.  LEXICAL TAGGER BASED ON HIDDEN MARKOV MODEL , 2006, 2006 Multiconference on Electronics and Photonics.

[20]  Horacio Rodríguez,et al.  Part-of-Speech Tagging Using Decision Trees , 1998, ECML.

[21]  Yuji Matsumoto,et al.  Unknown Word Guessing and Part-of-Speech Tagging Using Support Vector Machines , 2001, NLPRS.

[22]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[23]  Ahmed Guessoum,et al.  A Hidden Markov Model -Based POS Tagger for Arabic , 2006 .

[24]  Rada Mihalcea,et al.  Performance Analysis of a Part of Speech Tagging Task , 2003, CICLing.

[25]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[26]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[27]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[28]  Mannes Poel,et al.  A Neural Network Based Dutch Part of Speech Tagger , 2008 .

[29]  Cícero Nogueira dos Santos,et al.  Phrase Chunking Using Entropy Guided Transformation Learning , 2008, ACL.

[30]  Helmut Schmid,et al.  Part-of-Speech Tagging With Neural Networks , 1994, COLING.

[31]  Antal van den Bosch,et al.  Memory-Based Morphological Analysis Generation and Part-of-Speech Tagging of Arabic , 2005, SEMITIC@ACL.

[32]  Rieks op den Akker,et al.  A Support Vector Machine Approach to Dutch Part-of-Speech Tagging , 2007, IDA.

[33]  S. Khoja,et al.  APT: Arabic Part-of-speech Tagger , 2001 .

[34]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[35]  Michal Wrzeszcz,et al.  Accuracy of Baseline and Complex Methods Applied to Morphosyntactic Tagging of Polish , 2008, ICCS.

[36]  Qing Ma,et al.  A Multi-Neuro Tagger Using Variable Lengths of Contexts , 1999, COLING.

[37]  Wei Zhao,et al.  A New Method of the Automatically Marked Chinese Part of Speech Based on Gaussian Prior Smoothing Maximum Entropy Model , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[38]  Kadri Hacioglu,et al.  Automatic Processing of Modern Standard Arabic Text , 2007 .

[39]  Mary P. Harper,et al.  A Second-Order Hidden Markov Model for Part-of-Speech Tagging , 1999, ACL.

[40]  Cícero Nogueira dos Santos,et al.  Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning , 2008, PROPOR.

[41]  Sisay Fissaha Adafre Part of Speech Tagging for Amharic using Conditional Random Fields , 2005, SEMITIC@ACL.

[42]  Walter Daelemans,et al.  Improving Accuracy in word class tagging through the Combination of Machine Learning Systems , 2001, CL.

[43]  Mohamed Ben Ahmed,et al.  An Efficient Multi-agent System Combining POS-Taggers for Arabic Texts , 2006, CICLing.

[44]  Jesús Vilares,et al.  Formal Methods of Tokenization for Part-of-Speech Tagging , 2002, CICLing.

[45]  Hitoshi Isahara,et al.  A Multi-Neuro Tagger Using Variable Lenghts of Contexts , 1998, COLING-ACL.

[46]  Khalil Sima'an,et al.  Smoothing a Lexicon-based POS Tagger for Arabic and Hebrew , 2007, SEMITIC@ACL.

[47]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[48]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[49]  Hanna M. Wallach,et al.  Conditional Random Fields: An Introduction , 2004 .

[50]  Walter Daelemans,et al.  An efficient memory-based morphosyntactic tagger and parser for Dutch , 2007, CLIN 2007.

[51]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.