A retrieval method for similar Q&A articles of web bulletin board with relevance index derived from commercial web search engine
暂无分享,去创建一个
This paper addresses a retrieval method for BBS(Bulletin Board System) articles with relevance index between the retrieval query and an article. Simply using the keyword-based retrieval has limitation on narrowing the articles, because most BBS articles include various keywords and such combination of some unrelated keywords to the retrieval query causes unexpected results. On the other hand, most BBSs have a characteristic structure, so-called "thread", which consists of one question article and a set of answer articles. Based on this structure, our method calculates the relevance index of each part of an article with association index among words derived from the Internet search engine results. We applied it to a practical word-of-mouth BBS and compared with the retrieval method of cosine similarity index in the word-vector space. The results show that our method had 30% better retrieval accuracy.
[1] Dragomir R. Radev. A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure , 2000, SIGDIAL Workshop.
[2] Masanori Akiyoshi,et al. A retrieval method of similar question articles from web bulletin board , 2006, ICSOFT.
[3] Yoshifumi Masunaga,et al. A Diachronic Analysis of Gender-Related Web Communities Using a HITS-Based Mining Tool , 2006, APWeb.
[4] Kenji Kita,et al. Learning Nonstructural Distance Metric by Minimum Cluster Distortion , 2004, EMNLP.