Mining characteristics of epidemiological studies from Medline: a case study in obesity

BackgroundThe health sciences literature incorporates a relatively large subset of epidemiological studies that focus on population-level findings, including various determinants, outcomes and correlations. Extracting structured information about those characteristics would be useful for more complete understanding of diseases and for meta-analyses and systematic reviews.ResultsWe present an information extraction approach that enables users to identify key characteristics of epidemiological studies from MEDLINE abstracts. It extracts six types of epidemiological characteristic: design of the study, population that has been studied, exposure, outcome, covariates and effect size. We have developed a generic rule-based approach that has been designed according to semantic patterns observed in text, and tested it in the domain of obesity. Identified exposure, outcome and covariate concepts are clustered into health-related groups of interest. On a manually annotated test corpus of 60 epidemiological abstracts, the system achieved precision, recall and F-score between 79-100%, 80-100% and 82-96% respectively. We report the results of applying the method to a large scale epidemiological corpus related to obesity.ConclusionsThe experiments suggest that the proposed approach could identify key epidemiological characteristics associated with a complex clinical problem from related abstracts. When integrated over the literature, the extracted data can be used to provide a more complete picture of epidemiological efforts, and thus support understanding via meta-analysis and systematic reviews.

[1]  Alberto Lavelli,et al.  Disease Mention Recognition with Specific Features , 2010, BioNLP@ACL.

[2]  Soo Young Kim,et al.  The Definition of Obesity , 2016, Korean journal of family medicine.

[3]  Grace Chung,et al.  A method of extracting the number of trial participants from abstracts describing randomized controlled trials , 2008, Journal of telemedicine and telecare.

[4]  Harry Rutter,et al.  Certification of obesity as a cause of death in England 1979-2006. , 2010, European journal of public health.

[5]  Joel D. Martin,et al.  Automated Information Extraction of Key Trial Design Elements from Clinical Trial Publications , 2008, AMIA.

[6]  Yuji Matsumoto,et al.  Extracting Clinical Trial Design Information from MEDLINE Abstracts , 2007, New Generation Computing.

[7]  J. Seidell,et al.  Epidemiology of obesity. , 2002, Seminars in vascular medicine.

[8]  E. Faerstein,et al.  A DICTIONARY OF EPIDEMIOLOGY , 2016 .

[9]  Franck Thollard,et al.  Proceedings of COLING , 2004 .

[10]  Russ B. Altman,et al.  Extracting Subject Demographic Information From Abstracts of Randomized Clinical Trial Reports , 2007, MedInfo.

[11]  Grace Yuet-Chee Chung,et al.  Towards identifying intervention arms in randomized controlled trials: Extracting coordinating constructions , 2009, J. Biomed. Informatics.

[12]  George Hripcsak,et al.  Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[13]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[14]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[15]  J-D Kim,et al.  Corpora and their Annotation , 2006 .

[16]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[17]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[18]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[19]  Carol Friedman,et al.  Generating Executable Knowledge for Evidence-Based Medicine Using Natural Language and Semantic Processing , 2006, AMIA.

[20]  Parvez Hossain,et al.  Obesity and diabetes in the developing world--a growing challenge. , 2007, The New England journal of medicine.

[21]  Christopher J. Fox,et al.  A stop list for general text , 1989, SIGF.

[22]  Enrico W. Coiera,et al.  A Study of Structured Clinical Abstracts and the Semantic Classification of Sentences , 2007, BioNLP@ACL.

[23]  Joel D. Martin,et al.  ExaCT: automatic extraction of clinical trial characteristics from journal publications , 2010, BMC Medical Informatics Decis. Mak..

[24]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[25]  Marcelo Fiszman,et al.  Identifying Risk Factors for Metabolic Syndrome in Biomedical Text , 2007, AMIA.

[26]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[27]  Katerina T. Frantzi,et al.  Automatic recognition of multi-word terms , 1998 .

[28]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[29]  Grace Yuet-Chee Chung,et al.  Sentence retrieval for abstracts of randomized controlled trials , 2009, BMC Medical Informatics Decis. Mak..

[30]  Miquel Porta,et al.  A Dictionary of Epidemiology , 2008 .

[31]  D. Canoy,et al.  Challenges in obesity epidemiology , 2007, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[32]  Goran Nenadic,et al.  Enhancing automatic term recognition through recognition of variation , 2004, COLING.