Finding Text-Supported Gene-to-Disease Co-appearances with MOPED-Digger.

Gene/disease associations are a critical part of exploring disease causes and ultimately cures, yet the publications that might provide such information are too numerous to be manually reviewed. We present a software utility, MOPED-Digger, that enables focused human assessment of literature by applying natural language processing (NLP) to search for customized lists of genes and diseases in titles and abstracts from biomedical publications. The results are ranked lists of gene/disease co-appearances and the publications that support them. Analysis of 18,159,237 PubMed title/abstracts yielded 1,796,799 gene/disease co-appearances that can be used to focus attention on the most promising publications for a possible gene/disease association. An integrated score is provided to enable assessment of broadly presented published evidence to capture more tenuous connections. MOPED-Digger is written in Java and uses Apache Lucene 5.0 library. The utility runs as a command-line program with a variety of user-options and is freely available for download from the MOPED 3.0 website (moped.proteinspire.org).

[1]  Janos X. Binder,et al.  DISEASES: Text mining and data integration of disease–gene associations , 2014, bioRxiv.

[2]  Eugene Kolker,et al.  Beyond protein expression, MOPED goes multi-omics , 2014, Nucleic Acids Res..

[3]  Doron Lancet,et al.  MOPED: Model Organism Protein Expression Database , 2011, Nucleic Acids Res..

[4]  Winston Haynes,et al.  Unraveling the Complexities of Life Sciences Data , 2013, Big Data.

[5]  Eugene Kolker,et al.  MOPED 2.5--an integrated multi-omics resource: multi-omics profiling expression database now includes transcriptomics data. , 2014, Omics : a journal of integrative biology.

[6]  E. Kolker,et al.  The promise of multi-omics and clinical data integration to identify and target personalized healthcare approaches in autism spectrum disorders. , 2015, Omics : a journal of integrative biology.

[7]  Núria Queralt-Rosinach,et al.  DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes , 2015, Database J. Biol. Databases Curation.

[8]  Doron Lancet,et al.  MOPED enables discoveries through consistently processed proteomics data. , 2014, Journal of proteome research.

[9]  Zhiyong Lu,et al.  NCBI disease corpus: A resource for disease name recognition and concept normalization , 2014, J. Biomed. Informatics.

[10]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..