Text Mining Approach to Analyse the Relation between Obesity and Breast Cancer Data

Biomedical research needs to leverage and exploit large amount of information reported in scientific publication. Literature data collected from publications has to be managed to extract information, transforms into an understandable structure using text mining approaches. Text mining refers to the process of deriving high-quality information from text by finding relationships between entities which do not show direct associations. Therefore, as an example of this approach, we present the link between two diseases i.e. breast cancer and obesity.Obesity is known to be associated with cancer mortality, but little is known about the link between lifetime changes in BMI of obese person and cancer mortality in both males and females. In this article, literature data for obesity and breast cancer was obtained using PubMed database and then methodologies which employs groups of common genes and keywords with their frequency of occurrence in the data were used, aimed to establish relation between obesity and breast cancer visualized using Pi-charts and bar graphs. From the data analysis, we obtained 1 gene which showed the link between both the diseases and validated using statistical analysis and disease-connect web server. We also proposed 8 common higher frequency keywords which could be used for indexing while searching the literature for obesity and breast cancer in combination.

[1]  Karin M. Verspoor,et al.  Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct , 2015, J. Biomed. Semant..

[2]  Jyoti Rani,et al.  pubmed.mineR: An R package with text-mining algorithms to analyse PubMed abstracts , 2015, Journal of Biosciences.

[3]  Andrey Rzhetsky,et al.  DiseaseConnect: a comprehensive web server for mechanism-based disease–disease connections , 2014, Nucleic Acids Res..

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Jamal Shahrabi,et al.  An Application of Association Rule Mining to Extract Risk Pattern for Type 2 Diabetes Using Tehran Lipid and Glucose Study Database , 2015, International journal of endocrinology and metabolism.

[6]  Mark Stevenson,et al.  Exploring relation types for literature-based discovery , 2015, J. Am. Medical Informatics Assoc..

[7]  Tiffani J. Bright,et al.  PubMatrix: a tool for multiplex literature mining , 2003, BMC Bioinformatics.

[8]  Christoph Scholz,et al.  Obesity as an independent risk factor for decreased survival in node-positive high-risk breast cancer , 2015, Breast Cancer Research and Treatment.

[9]  Keith K. Burkhart,et al.  Data Mining FAERS to Analyze Molecular Targets of Drugs Highly Associated with Stevens-Johnson Syndrome , 2015, Journal of Medical Toxicology.

[10]  Núria Queralt-Rosinach,et al.  Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research , 2014, BMC Bioinformatics.

[11]  Gary D. Bader,et al.  WordCloud: a Cytoscape plugin to create a visual semantic summary of networks , 2011, Source Code for Biology and Medicine.

[12]  Carolien P. Schröder,et al.  BMI and Lifetime Changes in BMI and Cancer Mortality Risk , 2015, PloS one.