Using Maximum Entropy Model for Concept-Based Genomic Information Retrieval

Genomic Information Retrieval which contains huge highly specific information causes many problems, such as the synonym problem, long term name and rapid growing literature size. In this paper, we use a concept-based model for indexing and querying, which is not like the translation model or the traditional query expansion techniques. We adopt an extraction tool, MaxMatcher, which using Universal Medical Language System (UMLS) concepts to extract the concepts. After extracting concepts, there are some words or phrases would have two or more concept IDs. So, we use Maximum Entropy model to calculate the ambiguous words or phrases. A comparative experiment on the TREC 2007 Genomics Track data has been done.