LingPipe for 99.99% Recall of Gene Mentions
暂无分享,去创建一个
Text data mining over biomedical research literature is a needle-in-a-haystack problem. We contend that first-best methods performing at 90% F-measure are insufficient, especially given that performance is much worse for “unseen” phrases. In this paper, we recast the problem as one of n-best search rather than first-best database population. We describe LingPipe’s HMM and character language model-based chunkers, which extract mentions of genes in unseen MEDLINE abstracts at 99.99% recall with greater than 50% mean-average precision. We provide evaluation results in terms of received precision-recall curves on unseen data.
[1] BMC Bioinformatics , 2005 .
[2] Andrew McCallum,et al. Confidence Estimation for Information Extraction , 2004, NAACL.