Selecting indexing strings using adaptation
暂无分享,去创建一个
It is not easy to tokenize agglutinative languages like Japanese and Chinese into words. Many IR systems start with a dictionary-based morphology program like ChaSen [4]. Unfortunately, dictionaries cannot cover all possible words; unknown words such as proper nouns are important for IR. This paper proposes a statistical dictionary-free method for selecting index strings based on recent work on adaptive language modeling.
[1] Kenneth Ward Church. Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p2 , 2000, COLING.
[2] Kenneth Ward Church,et al. Empirical Term Weighting and Expansion Frequency , 2000, EMNLP.