Molecular profiling of thyroid cancer subtypes using large-scale text mining

BackgroundThyroid cancer is the most common endocrine tumor with a steady increase in incidence. It is classified into multiple histopathological subtypes with potentially distinct molecular mechanisms. Identifying the most relevant genes and biological pathways reported in the thyroid cancer literature is vital for understanding of the disease and developing targeted therapeutics.ResultsWe developed a large-scale text mining system to generate a molecular profiling of thyroid cancer subtypes. The system first uses a subtype classification method for the thyroid cancer literature, which employs a scoring scheme to assign different subtypes to articles. We evaluated the classification method on a gold standard derived from the PubMed Supplementary Concept annotations, achieving a micro-average F1-score of 85.9% for primary subtypes. We then used the subtype classification results to extract genes and pathways associated with different thyroid cancer subtypes and successfully unveiled important genes and pathways, including some instances that are missing from current manually annotated databases or most recent review articles.ConclusionsIdentification of key genes and pathways plays a central role in understanding the molecular biology of thyroid cancer. An integration of subtype context can allow prioritized screening for diagnostic biomarkers and novel molecular targeted therapeutics. Source code used for this study is made freely available online at https://github.com/chengkun-wu/GenesThyCan.

[1]  Sophia Ananiadou,et al.  FACTA: a text search engine for finding associated biomedical concepts , 2008, Bioinform..

[2]  Goran Nenadic,et al.  The GNAT library for local and remote gene mention normalization , 2011, Bioinform..

[3]  G. Brabant,et al.  Hypoxia-Inducible Factor in Thyroid Carcinoma , 2011, Journal of thyroid research.

[4]  Jari Björne,et al.  Complex event extraction at PubMed scale , 2010, Bioinform..

[5]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[6]  S. Filetti,et al.  Epigenetics of thyroid cancer and novel therapeutic targets. , 2011, Journal of molecular endocrinology.

[7]  Teruyoshi Hishiki,et al.  Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts , 2006, BMC Bioinformatics.

[8]  K. Cohen,et al.  Biomedical language processing: what's beyond PubMed? , 2006, Molecular cell.

[9]  M. Schlumberger,et al.  Progress in molecular-based management of differentiated thyroid cancer , 2013, The Lancet.

[10]  Goran Nenadic,et al.  PathNER: a tool for systematic identification of biological pathway mentions in the literature , 2013, BMC Systems Biology.

[11]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[12]  E. Kimura,et al.  Notch pathway is activated by MAPK signaling and influences papillary thyroid cancer proliferation. , 2013, Translational oncology.

[13]  Maurice Bouwhuis,et al.  CoPub: a literature-based keyword enrichment tool for microarray data analysis , 2008, Nucleic Acids Res..

[14]  Ralf Herwig,et al.  The ConsensusPathDB interaction database: 2013 update , 2012, Nucleic Acids Res..

[15]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[16]  Herb Chen,et al.  Resveratrol Induces Differentiation Markers Expression in Anaplastic Thyroid Carcinoma via Activation of Notch1 Signaling and Suppresses Cell Growth , 2013, Molecular Cancer Therapeutics.

[17]  Y. Baran,et al.  An update on molecular biology of thyroid cancers. , 2014, Critical reviews in oncology/hematology.

[18]  José María Carazo,et al.  Moara: a Java library for extracting and normalizing gene and protein mentions , 2010, BMC Bioinformatics.

[19]  L. Mulligan RET revisited: expanding the oncogenic portfolio , 2014, Nature Reviews Cancer.

[20]  Herb Chen,et al.  Current management of medullary thyroid cancer. , 2008, The oncologist.

[21]  Goran Nenadic,et al.  BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events , 2012, Bioinform..

[22]  M. Shah,et al.  Review: Thyroid cancer: emerging role for targeted therapies , 2010, Therapeutic advances in medical oncology.

[23]  P. Miccoli,et al.  New Targeted Therapies for Thyroid Cancer , 2011, Current genomics.

[24]  Y. Nikiforov,et al.  Increasing incidence of thyroid cancer: controversies explored , 2013, Nature Reviews Endocrinology.

[25]  A. Coatesworth,et al.  Thyroid cancer review 1: presentation and investigation of thyroid cancer , 2005, International journal of clinical practice.

[26]  Doron Lancet,et al.  MalaCards: an integrated compendium for diseases and their annotation , 2013, Database J. Biol. Databases Curation.

[27]  M. Nikiforova,et al.  Molecular genetics of thyroid cancer: implications for diagnosis, treatment and prognosis , 2008, Expert review of molecular diagnostics.

[28]  Jong C. Park,et al.  DigSee: disease gene search engine with evidence sentences (version cancer) , 2013, Nucleic Acids Res..

[29]  A. Awada,et al.  Recurrent thyroid cancer: a molecular-based therapeutic breakthrough , 2011, Current opinion in oncology.

[30]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[31]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[32]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[33]  Mariana L. Neves,et al.  CBR-Tagger: a case-based reasoning approach to the gene/protein mention problem , 2008, BioNLP.

[34]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[35]  G. Brabant,et al.  Expression of hypoxia-inducible factor 1α in thyroid carcinomas , 2010, Endocrine-related cancer.

[36]  Samik Ghosh,et al.  AlzPathway: a comprehensive map of signaling pathways of Alzheimer’s disease , 2012, BMC Systems Biology.

[37]  J. Capdevila,et al.  Thyroid Cancer: Molecular Aspects and New Therapeutic Strategies , 2012, Journal of thyroid research.

[38]  Gustavo F. Bayón,et al.  DNA methylation signatures identify biologically distinct thyroid cancer subtypes. , 2013, The Journal of clinical endocrinology and metabolism.

[39]  M. Cabanillas,et al.  Differentiated Thyroid Cancer: Management of Patients with Radioiodine Nonresponsive Disease , 2012, Journal of thyroid research.