Did You Know? A Rule-Based Approach to Finding Similar Questions on Online Health Forums
暂无分享,去创建一个
This paper describes our system submitted for the ICHI 2015 Healthcare Data Analytics Challenge. Given a relatively large corpus of questions posted by users on online health forums, for a newly posted question (i.e., Query question), our task is to find three most similar questions from the corpus. Our system employs Elastic search, a search server based on Lucene, at its core. The corpus of existing questions is indexed with n-grams. To search for most similar questions, the query question is re-written to a keyword-based query based on rules by considering multiple text components including title, key phrases, and noun phrases extracted from the question content.
[1] A. McCallum,et al. Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).
[2] Yoram Singer,et al. Context-sensitive learning methods for text categorization , 1996, SIGIR '96.
[3] Alla Keselman,et al. Exploring Lexical Forms: First-Generation Consumer Health Vocabularies , 2006, AMIA.