论文信息 - Finding the Similarity between Two Arabic Texts

Finding the Similarity between Two Arabic Texts

Calculating similarities between texts that have been written in one language or multiple languages still one of the most important challenges facing the natural language processing. This work offers many approaches that used for the texts similarity. The proposed system will find the similarity between two Arabic texts by using hybrid similarity measures techniques: Semantic similarity measure, Cosine similarity measure and N-gram ( using the Dice similarity measure). In our proposed system we will design Arabic SemanticNet that store the keywords for a specific field(computer science), by this network we can find semantic similarity between words according to specific equations. Cosine and N-gram similarity measures are used in order to find the similar characters sequences. The proposed system was executed by using Visual Basic 2012, and after testing it, it proved to be a worthy for finding the similarity between two Arabic texts (From the viewpoint of accuracy and search time).

Suhad Malallah kadhem | Aseel Qassim Abd Alameer

[1] David W. Conrath,et al. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[2] Wael Hassan Gomaa,et al. A Survey of Text Similarity Approaches , 2013 .

[3] Suleiman H. Mustafa,et al. N-Gram-Based Techniques for Arabic Text Document Matching; Case Study: Courses Accreditation , 2012 .

[4] Shalini Puri,et al. A Fuzzy Similarity Based Concept Mining Model for Text Classification , 2012, ArXiv.

[5] F. R.. Two Semantic Networks : Their Computation and Use for Understanding English Sentences , 2022 .

[6] K. Sree,et al. CLUSTERING BASED ON COSINE SIMILARITY MEASURE , 2012 .

[7] Suphakit Niwattanakul,et al. A Method for Measuring Keywords Similarity by Applying Jaccard’s, N-Gram and Vector Space , 2013 .

[8] Jin Feng,et al. Sentence Similarity based on Relevance , 2008 .

[9] Rakhi Chakraborty. DOMAIN KEYWORD EXTRACTION TECHNIQUE : A NEW WEIGHTING METHOD BASED ON FREQUENCY ANALYSIS , 2013 .