Experimenting a discriminative possibilistic classifier with reweighting model for Arabic morphological disambiguation

We perform Arabic morphological disambiguation on unlabeled vocalized corpora.We experiment possibilistic measures for imprecise morphological data classification.We assess the impact of a reweighting model and a possibilistic lexical likelihood.Possibilistic classification is accurate in modern and classical texts disambiguation. In this paper, we experiment a discriminative possibilistic classifier with a reweighting model for morphological disambiguation of Arabic texts. The main idea is to provide a possibilistic classifier that acquires automatically disambiguation knowledge from vocalized corpora and tests on non-vocalized texts. Initially, we determine all the possible analyses of vocalized words using a morphological analyzer. The values of their morphological features are exploited to train the classifier. The testing phase consists in identifying the accurate class value (i.e., a morphological feature) using the features of the preceding and the following words. The appropriate class is the one having the greatest value of a possibilistic measure computed over the training set. To discriminate the effect of each feature, we add the weights of the training attributes to this measure. To assess this approach, we carry out experiments on a corpus of Arabic stories and on the Arabic Treebank. We present results concerning all the morphological features and we discern to which degree the discriminative approach improves disambiguation rates and extract the dependency relationships among the features. The results reveal the contribution of possibility theory for resolving ambiguities in real applications. We also compare the success rates in modern versus classical Arabic texts. Finally, we try to evaluate the impact of the lexical likelihood in morphological disambiguation.

[1]  Yousif A. El-Imam Phonetization of Arabic: rules and algorithms , 2004, Comput. Speech Lang..

[2]  Jan Hajic,et al.  Morphological Tagging: Data vs. Dictionaries , 2000, ANLP.

[3]  Didier Dubois,et al.  Formal Representations of Uncertainty , 2010, Decision-making Process.

[4]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[5]  Wiem Lahbib,et al.  A Hybrid Approach for Arabic Semantic Relation Extraction , 2013, FLAIRS Conference.

[6]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[8]  Narjès Bellamine Ben Saoud,et al.  Evaluation of a possibilistic classification approach for Arabic texts disambiguation (Evaluation d'une approche de classification possibiliste pour la désambiguïsation des textes arabes) [in French] , 2014, TALN.

[9]  Mathieu Serrurier,et al.  Possibilistic classifiers for numerical data , 2013, Soft Comput..

[10]  Andreas Krause,et al.  A Utility-Theoretic Approach to Privacy and Personalization , 2008, AAAI.

[11]  Khaled Mellouli,et al.  Naïve possibilistic network classifiers , 2009, Fuzzy Sets Syst..

[12]  Alexandre Blansché Classification non supervisée avec pondération d'attributs par des méthodes évolutionnaires , 2006 .

[13]  Chih-Ming Chen,et al.  An efficient fuzzy classifier with feature selection based on fuzzy entropy , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[14]  Daoud Daoud,et al.  Arabic Disambiguation using Dependency Grammar , 2009 .

[15]  Narjès Bellamine Ben Saoud,et al.  Improving Arabic Texts Morphological Disambiguation Using a Possibilistic Classifier , 2014, NLDB.

[16]  Ruhi Sarikaya,et al.  Arabic diacritic restoration approach based on maximum entropy models , 2009, Comput. Speech Lang..

[17]  Andreas Stolcke,et al.  Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..

[18]  Ibrahim Bounhas,et al.  Toward a computer study of the reliability of Arabic stories , 2010 .

[19]  Didier Dubois,et al.  Possibility Theory: Qualitative and Quantitative Aspects , 1998 .

[20]  Mathieu Serrurier,et al.  Naive possibilistic classifiers for imprecise or uncertain numerical data , 2014, Fuzzy Sets Syst..

[21]  Nizar Habash,et al.  Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking , 2008, ACL.

[22]  Didier Dubois,et al.  An overview of ordinal and numerical approaches to causal diagnostic problem solving , 2000 .

[23]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[24]  Ann Bies,et al.  Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools , 2004 .

[25]  Narjès Bellamine Ben Saoud,et al.  A Possibilistic Approach for the Automatic Morphological Disambiguation of Arabic Texts , 2012, 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[26]  Yan Yue A Multi-Classified Method of Support Vector Machine (SVM) Based on Entropy , 2012 .

[27]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[28]  Narjès Bellamine Ben Saoud,et al.  Towards a Possibilistic Information Retrieval System Using Semantic Query Expansion , 2011, Int. J. Intell. Inf. Technol..

[29]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[30]  Bilel Elayeb,et al.  SARIPOD: Système multi-Agent de Recherche Intelligente POssibiliste de Documents Web. (SARIPOD: An Intelligent Possibilistic Web Information Retrieval using Multiagent System) , 2009 .

[31]  Mark J. F. Gales,et al.  Morphological decomposition in Arabic ASR systems , 2012, Comput. Speech Lang..

[32]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.

[33]  Narjès Bellamine Ben Saoud,et al.  A comparative study between possibilistic and probabilistic approaches for monolingual word sense disambiguation , 2014, Knowledge and Information Systems.

[34]  S. Khoja,et al.  APT: Arabic Part-of-speech Tagger , 2001 .

[35]  Mohamed Ben Ahmed,et al.  Towards an intelligent possibilistic web information retrieval using multiagent system , 2009, Interact. Technol. Smart Educ..

[36]  Ibrahim Bounhas,et al.  Organizing Contextual Knowledge for Arabic Text Disambiguation and Terminology Extraction , 2011 .

[37]  Ibrahim Bounhas,et al.  ArabOnto: experimenting a new distributional approach for building Arabic ontological resources , 2011, Int. J. Metadata Semant. Ontologies.

[38]  E. Jaynes Probability theory : the logic of science , 2003 .

[39]  Stephan Vogel,et al.  Context-based Arabic Morphological Analysis for Machine Translation , 2008, CoNLL.

[40]  Ibrahim Bounhas,et al.  A hybrid approach for Arabic multi-word term extraction , 2009, 2009 International Conference on Natural Language Processing and Knowledge Engineering.

[41]  Narjès Bellamine Ben Saoud,et al.  Arabic Morphological Analysis and Disambiguation Using a Possibilistic Classifier , 2012, ICIC.

[42]  Nizar Habash,et al.  Syntactic Annotation in the Columbia Arabic Treebank , 2009 .

[43]  Nizar Habash,et al.  Arabic Diacritization through Full Morphological Tagging , 2007, NAACL.

[44]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[45]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[46]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[47]  Hae-Chang Rim,et al.  Unsupervised word sense disambiguation using WordNet relatives , 2004, Comput. Speech Lang..

[48]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[49]  Dong-Hong Ji,et al.  Learning model order from labeled and unlabeled data for partially supervised classification, with application to word sense disambiguation , 2007, Comput. Speech Lang..

[50]  Nizar Habash,et al.  Automatic Morphological Enrichment of a Morphologically Underspecified Treebank , 2013, NAACL.

[51]  Aqil M. Azmi,et al.  A text summarizer for Arabic , 2012, Comput. Speech Lang..

[52]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[53]  Daoud Daoud Synchronized Morphological and Syntactic Disambiguation for Arabic , 2009 .

[54]  Fouzi Harrag,et al.  Ontology Extraction Approach for Prophetic Narration (Hadith) using Association Rules , 2013 .