A method of extracting the number of trial participants from abstracts describing randomized controlled trials

We have developed a method for extracting the number of trial participants from abstracts describing randomized controlled trials (RCTs); the number of trial participants may be an indication of the reliability of the trial. The method depends on statistical natural language processing. The number of interest was determined by a binary supervised classification based on a support vector machine algorithm. The method was trialled on 223 abstracts in which the number of trial participants was identified manually to act as a gold standard. Automatic extraction resulted in 2 false-positive and 19 false-negative classifications. The algorithm was capable of extracting the number of trial participants with an accuracy of 97% and an F-measure of 0.84. The algorithm may improve the selection of relevant articles in regard to question-answering, and hence may assist in decision-making.

[1]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[2]  Douglas G. Altman,et al.  The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials , 2001, The Lancet.

[3]  Graeme Hirst,et al.  Answering Clinical Questions with Role Identification , 2003, BioNLP@ACL.

[4]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[5]  Eduard H. Hovy,et al.  Intelligent Approaches to Mining the Primary Research Literature: Techniques, Systems, and Examples , 2008, Computational Intelligence in Medical Informatics.

[6]  Deborah A Swinglehurst Information needs of United Kingdom primary care clinicians. , 2005, Health information and libraries journal.

[7]  W. Richardson,et al.  The well-built clinical question: a key to evidence-based decisions. , 1995, ACP journal club.

[8]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[9]  Johanna I. Westbrook,et al.  Do online information retrieval systems help experienced clinicians answer clinical questions? , 2005, Journal of the American Medical Informatics Association : JAMIA.

[10]  Stephen B. Johnson,et al.  Accessing Heterogeneous Sources of Evidence to Answer Clinical Questions , 2001, J. Biomed. Informatics.

[11]  Enrico W. Coiera,et al.  A Study of Structured Clinical Abstracts and the Semantic Classification of Sentences , 2007, BioNLP@ACL.

[12]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[13]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[14]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[15]  Mattox Welcome to ARCHIVES CME , 2000, Archives of otolaryngology--head & neck surgery.

[16]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[17]  Russ B. Altman,et al.  Extracting Subject Demographic Information From Abstracts of Randomized Clinical Trial Reports , 2007, MedInfo.

[18]  Jon O Ebbert,et al.  Searching the medical literature using PubMed: a tutorial. , 2003, Mayo Clinic proceedings.

[19]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[20]  Jimmy J. Lin,et al.  Evaluation of PICO as a Knowledge Representation for Clinical Questions , 2006, AMIA.

[21]  Kevin Knight,et al.  Mining online text , 1999, Commun. ACM.

[22]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.