Global Genetics Research in Prostate Cancer: A Text Mining and Computational Network Theory Approach

Prostate cancer is the most common cancer type in men in Finland and second worldwide. In this paper, we analyze almost 150, 000 published papers about prostate cancer, authored by ten thousands of scientists worldwide, with an integrated text mining and computational network theory approach. We demonstrate how to integrate text mining with network analysis investigating research contributions of countries and collaborations within and between countries. Furthermore, we study the time evolution of individually and collectively studied genes. Finally, we investigate a collaboration network of Finland and compare studied genes with globally studied genes in prostate cancer genetics. Overall, our results provide a global overview of prostate cancer research in genetics. In addition, we present a specific discussion for Finland. Our results shed light on trends within the last 30 years and are useful for translational researchers within the full range from genetics to public health management and health policy.

[1]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[2]  Janos X. Binder,et al.  DISEASES: Text mining and data integration of disease–gene associations , 2014, bioRxiv.

[3]  Frank Emmert-Streib,et al.  Pathway Analysis of Expression Data: Deciphering Functional Building Blocks of Complex Diseases , 2011, PLoS Comput. Biol..

[4]  Gerben Menschaert,et al.  PubMeth: a cancer methylation database combining text-mining and expert annotation , 2007, Nucleic Acids Res..

[5]  A. Korhonen,et al.  Text Mining for Literature Review and Knowledge Discovery in Cancer Risk Assessment and Research , 2012, PloS one.

[6]  Frank Emmert-Streib,et al.  The Chronic Fatigue Syndrome: A Comparative Pathway Analysis , 2007, J. Comput. Biol..

[7]  Frank Emmert-Streib,et al.  Bagging Statistical Network Inference from Large-Scale Gene Expression Data , 2012, PloS one.

[8]  Zhiyong Lu,et al.  Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine , 2016, PLoS Comput. Biol..

[9]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[10]  Shang Gao,et al.  Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends , 2016, BMC Research Notes.

[11]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.

[12]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[13]  Matthias Dehmer,et al.  NetBioV: an R package for visualizing large network data in biology and medicine , 2014, Bioinform..

[14]  Li Li,et al.  HLungDB: an integrated database of human lung cancer research , 2009, Nucleic Acids Res..

[15]  M. Girolami,et al.  Analysis of free text in electronic health records for identification of cancer patient trajectories , 2017, Scientific Reports.

[16]  Jorma Isola,et al.  In vivo amplification of the androgen receptor gene and progression of human prostate cancer , 1995, Nature Genetics.

[17]  José Luís Oliveira,et al.  BeCAS: biomedical concept recognition services and visualization , 2013, Bioinform..

[18]  E. Schadt Molecular networks as sensors and drivers of common human diseases , 2009, Nature.

[19]  Anand Kumar,et al.  Text mining and ontologies in biomedicine: Making sense of raw text , 2005, Briefings Bioinform..

[20]  Matthias Dehmer,et al.  Defining Data Science by a Data-Driven Quantification of the Community , 2018, Mach. Learn. Knowl. Extr..

[21]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[22]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .

[23]  R H Brook,et al.  The UCLA Prostate Cancer Index: development, reliability, and validity of a health-related quality of life measure. , 1998, Medical care.

[24]  Wei Zhang,et al.  Transcriptome Sequencing Reveals PCAT5 as a Novel ERG-Regulated Long Noncoding RNA in Prostate Cancer. , 2015, Cancer research.

[25]  Leena Latonen,et al.  In Vivo Expression of miR-32 Induces Proliferation in Prostate Epithelium. , 2017, The American journal of pathology.

[26]  A. Hopkins Network pharmacology: the next paradigm in drug discovery. , 2008, Nature chemical biology.

[27]  M. Nykter,et al.  The Evolutionary History of Lethal Metastatic Prostate Cancer , 2015, Nature.

[28]  M. Daly,et al.  Genetic Mapping in Human Disease , 2008, Science.

[29]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[31]  H. Tipney,et al.  An introduction to effective use of enrichment analysis software , 2010, Human Genomics.

[32]  A. Barabasi,et al.  Human disease classification in the postgenomic era: A complex systems approach to human pathobiology , 2007, Molecular systems biology.

[33]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[34]  David E. Fisher,et al.  Precision medicine for cancer with next-generation functional diagnostics , 2015, Nature Reviews Cancer.

[35]  Mark Gerstein,et al.  Getting Started in Text Mining: Part Two , 2009, PLoS Comput. Biol..

[36]  Guido Jenster,et al.  Androgen regulation of micro‐RNAs in prostate cancer , 2011, The Prostate.

[37]  Casey S. Greene,et al.  Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery , 2015, Briefings Bioinform..

[38]  L. Holmberg,et al.  Fifteen-year survival in prostate cancer. A prospective, population-based study in Sweden. , 1997, JAMA.

[39]  R. Davey,et al.  Androgen Receptor Structure, Function and Biology: From Bench to Bedside. , 2016, The Clinical biochemist. Reviews.

[40]  Teemu Tolonen,et al.  Integrated clinical, whole-genome, and transcriptome analysis of multisampled lethal metastatic prostate cancer , 2016, Cold Spring Harbor molecular case studies.

[41]  Leena Latonen,et al.  CIP2A is a candidate therapeutic target in clinically challenging prostate cancer cell populations , 2015, Oncotarget.

[42]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[43]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[44]  Di Wu,et al.  miRCancer: a microRNA-cancer association database constructed by text mining on literature , 2013, Bioinform..

[45]  K. Bretonnel Cohen,et al.  Getting Started in Text Mining , 2008, PLoS Comput. Biol..

[46]  Cheng Zhang,et al.  Biomedical text mining and its applications in cancer research , 2013, J. Biomed. Informatics.

[47]  A. Arber,et al.  What causes cancer? , 1984, Nursing mirror.

[48]  Matthias Dehmer,et al.  A Machine Learning Perspective on Personalized Medicine: An Automized, Comprehensive Knowledge Base with Ontology for Pattern Recognition , 2018, Mach. Learn. Knowl. Extr..

[49]  Antoine M. van Oijen,et al.  Real-time single-molecule observation of rolling-circle DNA replication , 2009, Nucleic acids research.

[50]  Peter Kraft,et al.  Association of KLK3 (PSA) genetic variants with prostate cancer risk and PSA levels. , 2011, Carcinogenesis.

[51]  Barbara Kitchenham,et al.  Procedures for Performing Systematic Reviews , 2004 .

[52]  C. Mathers,et al.  Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012 , 2015, International journal of cancer.

[53]  Zhan Ye,et al.  SparkText: Biomedical Text Mining on Big Data Framework , 2016, PloS one.

[54]  A. Barabasi,et al.  Network medicine--from obesity to the "diseasome". , 2007, The New England journal of medicine.

[55]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .