Extracting Eligibility Criteria from the Narrative Text of Scientific Research Articles

Eligibility criteria among hundreds of National Health Insurance Research Database (NHIRD) research papers have similar constituent elements, such as demographic characteristics or diagnostic codes. The study results of the same disease could vary among different research due to the variation of the criteria statements, therefore the narrative patterns analysis tool would be helpful for summarizing the knowledge implicitly contained in the eligibility criteria. In this study, we developed a series of R-based text processing methods to extract the narrative eligibility criteria in NHIRD papers by simplifying the article titles and content paragraphs, identifying medical concepts and abbreviations, then detecting basic demographic characteristics and ICD-9-CM diagnosis codes. Although there is still room for improvement on study type identifying, the high performance in classifying the study type, detecting age restrictions and extracting ICD-9-CM codes still shows the system usefulness for the analysis of eligibility criteria.