论文信息 - Finding Themes in Medline Documents: Probabilistic Similarity Search

Finding Themes in Medline Documents: Probabilistic Similarity Search

Large on-line document databases, such as Medline, pose a major challenge of retrieving the few documents most releva nt to the user’s needs, while minimizing the return rate of nonrelevant documents. Retrieval of documents similar to a use rprovided example document is a promising query paradigm towards meeting this goal. We present a new theme-based probabilistic approach for find ing documents relevant to a given query document, and summarizi ng their contents. Preliminary experiments conducted over a s ubset of Medline documents related to AIDS demonstrate the effectiveness of our approach.

Hagit Shatkay | W. John Wilbur

[1] W. John Wilbur,et al. The Effectiveness of Document Neighboring in Search Enhancement , 1994, Inf. Process. Manag..

[2] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3] Daphne Koller,et al. Toward Optimal Feature Selection , 1996, ICML.

[4] W. Bruce Croft,et al. Combining automatic and manual index representations in probabilistic retrieval , 1995 .

[5] Naftali Tishby,et al. Distributional Clustering of English Words , 1993, ACL.

[6] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[7] David R. Karger,et al. Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[8] S. T. Dumais,et al. Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[9] Ido Dagan,et al. Detecting Sub-Topic Correspondence through Bipartite Term Clustering , 1999, ArXiv.

[10] Daphne Koller,et al. Using machine learning to improve information access , 1998 .

[11] Eric Saund,et al. Applying the Multiple Cause Mixture Model to Text Categorization , 1996, ICML.