论文信息 - Words Stemming Based on Structural and Semantic Similarity

Words Stemming Based on Structural and Semantic Similarity

Words stemming is one of the important issues in the field of natural language processing and information retrieval. There are different methods for stemming which are mostly language-dependent. Therefore, these stemmers are only applicable to particular languages. Because of the importance of this issue, in this paper, the proposed method for stemming is aimed to be language-independent. In the proposed stemmer, a bilingual dictionary is used and all of the words in the dictionary are firstly clustered. The words’ clustering is based on their structural and semantic similarity. Finally, finding the stem of new coming words is performed by making use of the previously formatted clusters. To evaluate the proposed scheme, words stemming is done on both Persian and English languages. The encouraging results indicate the good performance of the proposed method compared with its counterparts.

Seyed Mostafa Fakhrahmad | Mohammad Hadi Sadreddini | Hossein Taghi-Zadeh | Amir Hossein Rasekh | Mohammad Hassan Dianati

[1] Azadeh Shakery,et al. A structural rule-based stemmer for Persian , 2010, 2010 5th International Symposium on Telecommunications.

[2] Robert Krovetz,et al. Viewing morphology as an inference process , 1993, Artif. Intell..

[3] James Mayfield,et al. Single n-gram stemming , 2003, SIGIR.

[4] Swapan K. Parui,et al. A novel corpus-based stemming algorithm using co-occurrence statistics , 2011, SIGIR.

[5] Venkata Subramaniam,et al. Information Retrieval: Data Structures & Algorithms , 1992 .

[6] Yiming Yang,et al. Unsupervised Learning of Arabic Stemming Using a Parallel Corpus , 2003, ACL.

[7] Masood Ghayoomi. Bootstrapping the Development of an HPSG-based Treebank for Persian , 2012 .

[8] Matthew A. Jaro,et al. Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[9] Richard W. Hamming,et al. Error detecting and error correcting codes , 1950 .

[10] Nicola Orio,et al. A novel method for stemmer generation based on hidden markov models , 2003, CIKM '03.

[11] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[12] Ricardo Baeza-Yates,et al. Information Retrieval: Data Structures and Algorithms , 1992 .

[13] Deepa Gupta,et al. Improving Unsupervised Stemming by Fusing Partial Lemmatization Coupled with , 2012 .

[14] Mehrnoush Shamsfard,et al. A Bottom Up approach to Persian Stemming , 2008, IJCNLP.

[15] Kazem Taghva,et al. A stemming algorithm for the Farsi language , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[16] Ahmed A. Rafea,et al. An accuracy-enhanced light stemmer for arabic text , 2011, TSLP.

[17] Nicola Ferro,et al. University of Padua at CLEF 2002: Experiments to Evaluate a Statistical Stemming Algorithm , 2002, CLEF.

[18] Johannes Leo,et al. Book reviewCompetitive strategy: Techniques for analysing industries and competitors : Porter, Michael E. Free Press (Macmillan), New York, 396 pages, $17.95 , 1982 .

[19] Carl P. Spaulding. Sine-Cosine Angular Position Encoders , 1956 .