论文信息 - Hidden markov model based Arabic morphological analyzer

Hidden markov model based Arabic morphological analyzer

Natural language processing tasks includes summarization, machine translation, question understanding, part of speech tagging, etc. In order to achieve those tasks, a proper language representation must be defined. Roots and stems are considered as representations for some of those systems. A word needs to be processed to extract its root or stem. This paper presents a new technique that extracts word weights, by stripping of prefixes and suffixes from a given word. This technique is based on Hidden Markov Model (HMM). A path from a start state to the end state represents a word, each state constitute letters of a word. States are prefixes, weights, and suffixes. The best selected path should have the highest likelihood of a word. The approach results in a promising 95% performance. Key words: Natural language processing, morphology, hidden markov model, stem.

[1] Amna A. Al Kaabi,et al. Arabic Light Stemmer : Anew Enhanced Approach , 2005 .

[2] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[3] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4] Günter Neumann,et al. Arabic Computational Morphology: Knowledge-based and Empirical Methods , 2007 .

[5] Y. O. M. E. Hadj,et al. ARABIC PART-OF-SPEECH TAGGING USING THE SENTENCE STRUCTURE , 2022 .

[6] Ahmed Guessoum,et al. A Hidden Markov Model -Based POS Tagger for Arabic , 2006 .

[7] Kazem Taghva,et al. Arabic stemming without a root dictionary , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[8] Mohammad Hajjar,et al. A System for Evaluation of Arabic Root Extraction Methods , 2010, 2010 Fifth International Conference on Internet and Web Applications and Services.

[9] Kareem Darwish,et al. Building a Shallow Arabic Morphological Analyser in One Day , 2002, SEMITIC@ACL.

[10] Lisa Ballesteros,et al. Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[11] Fredric C. Gey,et al. Building an Arabic Stemmer for Information Retrieval , 2002, TREC.

[12] James H. Martin,et al. Speech and language processing: an introduction to natural language processing , 2000 .