Text similarity calculation has become one of the key issues of many applications such as information retrieval, semantic disambiguation, automatic question answering. There are increasing needs of similarity calculations in different levels, e.g. characters, vocabularies, syntactic structures and semantic etc. Most of existing semantic similarity algorithms can be categorized into statistical based methods, rule based methods and combination of these two methods. Statistical methods use knowledge bases to incorporate more comprehensive knowledge and have the capability of reducing knowledge noise. So they are able to obtain better performance. Nevertheless, for the unbalanced distribution of different items in the knowledge base, semantic similarity calculation performance for low-frequency words is usually poor. In this work, based on the distributions of stop-words, we proposes a weights normalization method for semantic dimensions. The proposed method uses the semantic independence of stop-words to avoid semantic bias of corpus in statistical methods. It further improves the accuracy of semantic similarity computation. Experiments compared with several existing algorithms show the effectiveness of the proposed method.
[1]
Ian H. Witten,et al.
An effective, low-cost measure of semantic relatedness obtained from Wikipedia links
,
2008
.
[2]
Qingcai Chen,et al.
A combined measure for text semantic similarity
,
2013,
2013 International Conference on Machine Learning and Cybernetics.
[3]
Davide Buscaldi,et al.
LIPN-CORE: Semantic Text Similarity using n-grams, WordNet, Syntactic Analysis, ESA and Information Retrieval based Features
,
2013,
*SEMEVAL.
[4]
Evgeniy Gabrilovich,et al.
Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis
,
2007,
IJCAI.
[5]
Ehud Rivlin,et al.
Placing search in context: the concept revisited
,
2002,
TOIS.
[6]
Wang Pu,et al.
Ontology-Based Measure of Semantic Similarity between Concepts
,
2009,
2009 WRI World Congress on Software Engineering.
[7]
Tong Wang,et al.
Refining the Notions of Depth and Density in WordNet-based Semantic Similarity Measures
,
2011,
EMNLP.
[8]
Simone Paolo Ponzetto,et al.
WikiRelate! Computing Semantic Relatedness Using Wikipedia
,
2006,
AAAI.