Topic-Oriented Information Detection and Scoring

This paper introduces a new approach for topic-oriented information detection and scoring (TOIDS) based on a hybrid design: integrating characteristic word combination and self learning. Using the characteristic word combination approach, both related and unrelated words are involved to judge a webpage's relevance. To address the domain adaptation problem, our self learning technique utilizes historical information from characteristic word lexicon to facilitate detection. Empirical results indicate that the proposed approach outperforms benchmark systems, achieving higher precision. We also demonstrate that our approach can be easily adapted in different domains.