论文信息 - Optimizing Feature Representation for Automated Systematic Review Work Prioritization

Optimizing Feature Representation for Automated Systematic Review Work Prioritization

Automated document classification can be a valuable tool for enhancing the efficiency of creating and updating systematic reviews (SRs) for evidence-based medicine. One way document classification can help is in performing work prioritization: given a set of documents, order them such that the most likely useful documents appear first. We evaluated several alternate classification feature systems including unigram, n-gram, MeSH, and natural language processing (NLP) feature sets for their usefulness on 15 SR tasks, using the area under the receiver operating curve as a measure of goodness. We also examined the impact of topic-specific training data compared to general SR inclusion data. The best feature set used a combination of n-gram and MeSH features. NLP-based features were not found to improve performance. Furthermore, topic-specific training data usually provides a significant performance gain over more general SR training.

Aaron M. Cohen | A. Cohen

[1] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[2] Deborah J. Cook,et al. Systematic Reviews: Synthesis of Best Evidence for Health Care Decisions , 1998, Annals of Internal Medicine.

[3] Alan R. Aronson,et al. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[4] Susan Mallett,et al. How many Cochrane reviews are needed to cover existing evidence on the effects of healthcare interventions? , 2003, ACP journal club.

[5] William R. Hersh,et al. TREC GENOMICS Track Overview , 2003, TREC.

[6] Yindalon Aphinyanagphongs,et al. Research Paper: Text Categorization Models for High-Quality Article Retrieval in Internal Medicine , 2004, J. Am. Medical Informatics Assoc..

[7] Wanda Pratt,et al. The Effect of Feature Representation on MEDLINE Document Classification , 2005, AMIA.

[8] Lucila Ohno-Machado,et al. The use of receiver operating characteristic curves in biomedical informatics , 2005, J. Biomed. Informatics.

[9] William R. Hersh,et al. A Survey of Current Work in Biomedical Text Mining , 2005 .

[10] Yindalon Aphinyanagphongs,et al. Research Paper: A Comparison of Citation Metrics to Machine Learning Filters for the Identification of High Quality MEDLINE Documents , 2006, J. Am. Medical Informatics Assoc..

[11] William R. Hersh,et al. Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[12] Tom Fawcett,et al. ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .