Constructing a database for the relations between CNV and human genetic diseases via systematic text mining

BackgroundThe detection and interpretation of CNVs are of clinical importance in genetic testing. Several databases and web services are already being used by clinical geneticists to interpret the medical relevance of identified CNVs in patients. However, geneticists or physicians would like to obtain the original literature context for more detailed information, especially for rare CNVs that were not included in databases.ResultsThe resulting CNVdigest database includes 440,485 sentences for CNV-disease relationship. A total number of 1582 CNVs and 2425 diseases are involved. Sentences describing CNV-disease correlations are indexed in CNVdigest, with CNV mentions and disease mentions annotated.ConclusionsIn this paper, we use a systematic text mining method to construct a database for the relationship between CNVs and diseases. Based on that, we also developed a concise front-end to facilitate the analysis of CNV/disease association, providing a user-friendly web interface for convenient queries. The resulting system is publically available at http://cnv.gtxlab.com/.

[1]  Zhiyong Lu,et al.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models , 2016, Bioinform..

[2]  Chunquan Li,et al.  CNVD: Text mining‐based copy number variation in disease database , 2012, Human mutation.

[3]  Richard Tzong-Han Tsai,et al.  Overview of BioCreative II gene mention recognition , 2008, Genome Biology.

[4]  Yong Dou,et al.  PR-ELM: Parallel regularized extreme learning machine based on cluster , 2016, Neurocomputing.

[5]  Grier P Page,et al.  Karyotype versus microarray testing for genetic abnormalities after stillbirth. , 2012, The New England journal of medicine.

[6]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[7]  Jianping Yin,et al.  ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers , 2018, Molecules.

[8]  Ton Dijkstra,et al.  Context-dependent Semantic Processing in the Human Brain: Evidence from Idiom Comprehension , 2013, Journal of Cognitive Neuroscience.

[9]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database's 10th year anniversary: update 2015 , 2014, Nucleic Acids Res..

[10]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[11]  Christopher Ré,et al.  Large-scale extraction of gene interactions from full-text literature using DeepDive , 2015, Bioinform..

[12]  K. Bretonnel Cohen,et al.  Text mining for the biocuration workflow , 2012, Database J. Biol. Databases Curation.

[13]  Miguel Pignatelli,et al.  Database: The Journal of Biological Databases and Curation , 2016 .

[14]  Canqun Yang,et al.  MilkyWay-2 supercomputer: system and application , 2014, Frontiers of Computer Science.

[15]  Firoj Alam,et al.  A knowledge-poor approach to chemical-disease relation extraction , 2016, Database J. Biol. Databases Curation.

[16]  Gongzhu Hu,et al.  Clinical text analysis using machine learning methods , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).

[17]  K. Bretonnel Cohen,et al.  Getting Started in Text Mining , 2008, PLoS Comput. Biol..

[18]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[19]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[20]  Heidi L Rehm,et al.  ClinGen--the Clinical Genome Resource. , 2015, The New England journal of medicine.

[21]  Peining Li,et al.  Spectrum of Cytogenomic Abnormalities Revealed by Array Comparative Genomic Hybridization on Products of Conception Culture Failure and Normal Karyotype Samples. , 2016, Journal of genetics and genomics = Yi chuan xue bao.

[22]  Manuel Corpas,et al.  DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. , 2009, American journal of human genetics.

[23]  Ulf Leser,et al.  ChemSpot: a hybrid system for chemical named entity recognition , 2012, Bioinform..

[24]  Hongfei Lin,et al.  Applying Feature Coupling Generalization for Protein-Protein Interaction Extraction , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[25]  Halil Kilicoglu,et al.  Syntactic Dependency Based Heuristics for Biological Event Extraction , 2009, BioNLP@HLT-NAACL.

[26]  Zhiyong Lu,et al.  Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II , 2012, Database J. Biol. Databases Curation.

[27]  Goran Nenadic,et al.  BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events , 2012, Bioinform..

[28]  Goran Nenadic,et al.  Towards semi-automated curation: using text mining to recreate the HIV-1, human protein interaction database , 2012, Database J. Biol. Databases Curation.

[29]  Zhiyong Lu,et al.  tmVar: a text mining approach for extracting sequence variants in biomedical literature , 2013, Bioinform..

[30]  Goran Nenadic,et al.  Cataloging the biomedical world of pain through semi-automated curation of molecular interactions , 2013, Database J. Biol. Databases Curation.

[31]  Wei Wang,et al.  Dependency-based long short term memory network for drug-drug interaction extraction , 2017, BMC Bioinformatics.

[32]  Min Song,et al.  PKDE4J: Entity and relation extraction for public knowledge discovery , 2015, J. Biomed. Informatics.

[33]  D. Rebholz-Schuhmann,et al.  Text-mining solutions for biomedical research: enabling integrative biology , 2012, Nature Reviews Genetics.

[34]  Lars Feuk,et al.  The Database of Genomic Variants: a curated collection of structural variation in the human genome , 2013, Nucleic Acids Res..