论文信息 - Selecting indexing strings using adaptation

Selecting indexing strings using adaptation

It is not easy to tokenize agglutinative languages like Japanese and Chinese into words. Many IR systems start with a dictionary-based morphology program like ChaSen [4]. Unfortunately, dictionaries cannot cover all possible words; unknown words such as proper nouns are important for IR. This paper proposes a statistical dictionary-free method for selecting index strings based on recent work on adaptive language modeling.

Yoshiyuki Takeda | Kyoji Umemura

[1] Kenneth Ward Church. Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p2 , 2000, COLING.

[2] Kenneth Ward Church,et al. Empirical Term Weighting and Expansion Frequency , 2000, EMNLP.