Access to health information by consumers is hampered by a fundamental language gap. Current attempts to close the gap leverage consumer oriented health information, which does not, however, have good coverage of slang medical terminology. In this paper, we present a Bayesian model to automatically align documents with different dialects (slang, common and technical) while extracting their semantic topics. The proposed diaTM model enables effective information retrieval, even when the query contains slang words, by explicitly modeling the mixtures of dialects in documents and the joint influence of dialects and topics on word selection. Simulations using consumer questions to retrieve medical information from a corpus of medical documents show that diaTM achieves a 25% improvement in information retrieval relevance by nDCG@5 over an LDA baseline.
[1]
Alla Keselman,et al.
Assessing Consumer Health Vocabulary Familiarity: An Exploratory Study
,
2007,
Journal of medical Internet research.
[2]
R A Greenes,et al.
Characteristics of Consumer Terminology for Health Information Retrieval
,
2002,
Methods of Information in Medicine.
[3]
Aysu Betin Can,et al.
MedicoPort: A medical search engine for all
,
2007,
Comput. Methods Programs Biomed..
[4]
Connie V. Chan,et al.
A Taxonomy Characterizing Complexity of Consumer eHealth Literacy
,
2009,
AMIA.
[5]
Qing Zeng-Treitler,et al.
Research Paper: Assisting Consumer Health Information Retrieval with Query Recommendations
,
2006,
J. Am. Medical Informatics Assoc..