DeepPPPred: Deep Ensemble Learning with Transformers, Recurrent and Convolutional Neural Networks for Human Protein-Phenotype Co-mention Classification
暂无分享,去创建一个
The extensive collection of biomedical literature is arguably the best source of knowledge and information on the latest scientific findings and fundamental problems for the biological and clinical communities. However, these articles contain unstructured text; therefore, this valuable knowledge may remain hidden without manual curation, which is tedious and time-consuming due to the rapid growth of publication. The relationships and associations between human proteins and phenotypic abnormalities associated with human disease are one such area of valuable knowledge. This situation calls for the development of accurate computational tools capable of automatically inferring these associations from text data, assisting human curators in expediting their triage and information extraction tasks. This work develops DeepPPPred, a deep ensemble learning model for protein-phenotype co-mention classification at the sentence level. In particular, DeepPPPred combines Support Vector Machines, Transformer models, Recurrent Neural Networks, and Convolutional Neural Networks via stacking. Our experimental results obtained using a manually curated gold-standard dataset demonstrate that DeepPPPred can provide state-of-the-art performance while outperforming all its competitors. This is the first study that develops deep learning models for the problem of classifying human protein-phenotype co-mentions. Our findings have implications for the biological and clinical communities and text mining and natural language processing developers working on biomedical relation extraction.