Semi-automatic curation of chronic obstructive pulmonary disease phenotypes using Argo.

Argo is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic curation of information from literature. It enables its technical users to build their own customised solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo’s graphical annotation interface, domain experts can then make use of the workflows’ automatically generated output to curate information of interest. As part of our participation in the User Interactive Task of BioCreative V, we asked five domain experts to utilise Argo for the curation of phenotypes relevant to the chronic obstructive pulmonary disease (COPD). Specifically, they carried out three curation subtasks over passages drawn from full-text PubMed Central papers relevant to COPD. These include: (1) the markup of phenotypic mentions in text, e.g., medical conditions, signs or symptoms, drugs and proteins, (2) linking of mentions to relevant ontologies, i.e., normalisation, and (3) annotation of relations between COPD and other mentions. Analysis of the resulting annotations shows that an increase in throughput (9 vs. 14 curated passages per hour) was obtained with text mining-assisted curation. Inter-annotator agreement measured based on concept annotations was at an average F-score of 68.12%. To evaluate the performance of the automatic curation workflow, we compared the annotations it produced against those provided by one of the domain experts and obtained an F-score of 66.97%.