Biomedical Literature Mining

Information: If you are reading this, you know how important it is and almost certainly look to the biomedical literature for a large part of the information you need. We work hard to fi nd more and more biomedical literature, seeking new content from multiple sources. But, can there be too much of a good thing? Most science is reductionist by nature. It is diffi cult enough fi nding the relevant nuggets of information from 1,000 documents. It is at least ten times harder to do so from 10,000 documents. And, with 25 million biomedical journal articles and many times that of other textual information sources, we are faced with signifi cant challenges. In this introduction, we identify some of those challenges to prepare you for the remaining chapters.

[1]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[2]  Juliane Fluck,et al.  Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , 2012, J. Biomed. Informatics.

[3]  R W Groves,et al.  Detection of circulating adhesion molecules in erythrodermic skin disease. , 1995, Journal of the American Academy of Dermatology.

[4]  Zhiyong Lu,et al.  Finding Query Suggestions for PubMed , 2009, AMIA.

[5]  Zhiyong Lu,et al.  Viewpoint Paper: Evaluating Relevance Ranking Strategies for MEDLINE Retrieval , 2009, J. Am. Medical Informatics Assoc..

[6]  Hagit Shatkay,et al.  SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. , 2007, Bioinformatics.

[7]  Edward N Baker,et al.  The Crystal Structure of Rv1347c, a Putative Antibiotic Resistance Protein from Mycobacterium tuberculosis, Reveals a GCN5-related Fold and Suggests an Alternative Function in Siderophore Biosynthesis*♦ , 2005, Journal of Biological Chemistry.

[8]  Abdul Mateen Rajput,et al.  Automatic detection of adverse events to predict drug label changes using text and data mining techniques , 2013, Pharmacoepidemiology and drug safety.

[9]  Hagit Shatkay,et al.  Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge , 2013, BMC Bioinformatics.

[10]  Karin M. Verspoor,et al.  Protein annotation as term categorization in the gene ontology using word proximity networks , 2005, BMC Bioinformatics.

[11]  A. Valencia Automatic annotation of protein function. , 2005, Current opinion in structural biology.

[12]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[13]  R. Leibel,et al.  Molecular physiology of weight regulation in mice and humans , 2008, International Journal of Obesity.

[14]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[15]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[16]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[17]  Alberto Lavelli,et al.  Disease Mention Recognition with Specific Features , 2010, BioNLP@ACL.

[18]  C. Friedman,et al.  Detection of Pharmacovigilance‐Related Adverse Events Using Electronic Health Records and Automated Methods , 2012, Clinical pharmacology and therapeutics.

[19]  André Reis,et al.  Identification of low-frequency TRAF3IP2 coding variants in psoriatic arthritis patients and functional characterization , 2012, Arthritis Research & Therapy.

[20]  Supriyo De,et al.  Systematic analysis, comparison, and integration of disease based human genetic association data and mouse genetic phenotypic information , 2010, BMC Medical Genomics.

[21]  C. Fiehn,et al.  Methotrexate transport mechanisms: the basis for targeted drug delivery and ß-folate-receptor-specific treatment. , 2010, Clinical and experimental rheumatology.

[22]  Mário J. Silva,et al.  Finding genomic ontology terms in text using evidence content , 2005, BMC Bioinformatics.

[23]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse , 2011, Nucleic Acids Res..

[24]  G. Lanfranchi,et al.  Reconstruction and functional analysis of altered molecular pathways in human atherosclerotic arteries , 2009, BMC Genomics.

[25]  Perry D Moerland,et al.  Distinctive expression of chemokines and transforming growth factor-beta signaling in human arterial endothelium during atherosclerosis. , 2007, The American journal of pathology.

[26]  Sarvnaz Karimi Drug Side-Effects : What Do Patient Forums Reveal ? , 2011 .

[27]  Xu Han,et al.  Literature Based Drug Interaction Prediction with Clinical Assessment Using Electronic Medical Records: Novel Myopathy Associated Drug Interactions , 2012, PLoS Comput. Biol..

[28]  Ulf Leser,et al.  Relation Extraction for Drug-Drug Interactions using Ensemble Learning , 2011 .

[29]  Xiao Li,et al.  Effective Top-Down Active Learning for Hierarchical Text Classification , 2013, PAKDD.

[30]  Carol Friedman,et al.  Drug-drug interaction through molecular structure similarity analysis , 2012, J. Am. Medical Informatics Assoc..

[31]  Karin M. Verspoor,et al.  Combining heterogeneous data sources for accurate functional annotation of proteins , 2013, BMC Bioinformatics.

[32]  E. Topol,et al.  Identification of new genes differentially expressed in coronary artery disease by expression profiling. , 2003, Physiological genomics.

[33]  K. Bretonnel Cohen,et al.  A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools , 2012, BMC Bioinformatics.

[34]  Shuying Shen,et al.  Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents , 2010, J. Am. Medical Informatics Assoc..

[35]  Hongfei Lin,et al.  Extracting Drug-Drug Interaction from the Biomedical Literature Using a Stacked Generalization-Based Approach , 2013, PloS one.

[36]  Azadeh Nikfarjam,et al.  Pattern mining for extraction of mentions of Adverse Drug Reactions from user comments. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[37]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[38]  S. Grefsheim,et al.  Information needs and information seeking in a biomedical research setting: a study of scientists and science administrators. , 2007, Journal of the Medical Library Association : JMLA.

[39]  Zhiyong Lu,et al.  Click-words: learning to predict document keywords from a user perspective , 2010, Bioinform..

[40]  Lei Wang,et al.  Three options for citation tracking: Google Scholar, Scopus and Web of Science , 2006, Biomedical digital libraries.

[41]  Cédrick Fairon,et al.  Annotation analysis for testing drug safety signals using unstructured clinical notes , 2012, J. Biomed. Semant..

[42]  Samuel Szomstein,et al.  Psoriasis Remission after Laparoscopic Roux-En-Y Gastric Bypass for Morbid Obesity , 2004, Obesity surgery.

[43]  Zlatko Trajanoski,et al.  CARMAweb: comprehensive R- and bioconductor-based web service for microarray data analysis , 2006, Nucleic Acids Res..

[44]  Bradley M. Hemminger,et al.  Information seeking behavior of academic scientists , 2007, J. Assoc. Inf. Sci. Technol..

[45]  I. Hoskins,et al.  Databases of biomedical literature: getting the whole picture. , 2008, Archives of internal medicine.

[46]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[47]  M. Falagas,et al.  World databases of summaries of articles in the biomedical fields. , 2007, Archives of internal medicine.

[48]  Christian Gieger,et al.  Common variants at TRAF3IP2 are associated with susceptibility to psoriatic arthritis and psoriasis , 2010, Nature Genetics.

[49]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Mayte Suárez-Fariñas,et al.  Expanding the Psoriasis Disease Profile: Interrogation of the Skin and Serum of Patients with Moderate-to-Severe Psoriasis , 2012, The Journal of investigative dermatology.

[51]  K. Cohen,et al.  Biomedical language processing: what's beyond PubMed? , 2006, Molecular cell.

[52]  Suyan Tian,et al.  Meta-Analysis Derived (MAD) Transcriptome of Psoriasis Defines the “Core” Pathogenesis of Disease , 2012, PloS one.

[53]  Hua Xu,et al.  A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries , 2011, J. Am. Medical Informatics Assoc..

[54]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[55]  Christopher G. Chute,et al.  Word sense disambiguation across two domains: Biomedical literature and clinical notes , 2008, J. Biomed. Informatics.

[56]  Bridget T. McInnes,et al.  Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation , 2011, BMC Bioinformatics.

[57]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[58]  T. Valle,et al.  Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. , 2001, The New England journal of medicine.

[59]  A. Hopkins,et al.  The druggable genome , 2002, Nature Reviews Drug Discovery.

[60]  Tamara Munzner,et al.  Cerebral: a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation , 2007, Bioinform..

[61]  Carol Friedman,et al.  Enhancing Adverse Drug Event Detection in Electronic Health Records Using Molecular Structure Similarity: Application to Pancreatitis , 2012, PloS one.

[62]  Isabel Segura-Bedmar,et al.  The 1st DDIExtraction-2011 challenge task: Extraction of Drug-Drug Interactions from biomedical texts , 2011 .

[63]  Honghui Zhou,et al.  Therapeutic targeting of the IL-12/23 pathways: generation and characterization of ustekinumab , 2011, Nature Biotechnology.

[64]  L. Skov,et al.  Cardiovascular disease event rates in patients with severe psoriasis treated with systemic anti‐inflammatory drugs: a Danish real‐world cohort study , 2013, Journal of internal medicine.

[65]  Dietrich Rebholz-Schuhmann,et al.  Categorization of services for seeking information in biomedical literature: a typology for improvement of practice , 2008, Briefings Bioinform..

[66]  K. Bretonnel Cohen,et al.  Natural Language Processing and Systems Biology , 2004, Artificial Intelligence Methods And Tools For Systems Biology.

[67]  Lawrence Hunter,et al.  Improving protein function prediction methods with integrated literature data , 2008, BMC Bioinformatics.

[68]  Lyle H. Ungar,et al.  Identifying potential adverse effects using the web: A new approach to medical hypothesis generation , 2011, J. Biomed. Informatics.

[69]  César de Pablo-Sánchez,et al.  Using a shallow linguistic kernel for drug-drug interaction extraction , 2011, J. Biomed. Informatics.

[70]  Maria Kvist,et al.  Exploration of Adverse Drug Reactions in Semantic Vector Space Models of Clinical Text , 2012, ICML 2012.

[71]  N. Shah,et al.  Pharmacovigilance Using Clinical Notes , 2013, Clinical pharmacology and therapeutics.

[72]  Jennifer E Towne,et al.  The emerging role of IL-17 in the pathogenesis of psoriasis: preclinical and clinical findings. , 2013, The Journal of investigative dermatology.

[73]  Christopher R. Lindholm,et al.  A high-fat diet decreases AMPK activity in multiple tissues in the absence of hyperglycemia or systemic inflammation in rats , 2012, Journal of Physiology and Biochemistry.

[74]  A Valencia,et al.  An Overview of BioCreative II.5 , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[75]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[76]  Karin M. Verspoor,et al.  Large-Scale Testing of Bibliome Informatics Using Pfam Protein Families , 2005, Pacific Symposium on Biocomputing.

[77]  Norbert Reider,et al.  A marriage of two “Methusalem” drugs for the treatment of psoriasis? , 2013, Dermato-endocrinology.

[78]  Zhen Hu,et al.  BMC Bioinformatics BioMed Central Methodology article CLEAN: CLustering Enrichment ANalysis , 2009 .

[79]  Lang Li,et al.  Evaluation of Linear Classifiers on Articles Containing Pharmacokinetic Evidence of Drug-Drug Interactions , 2012, Pacific Symposium on Biocomputing.

[80]  Richard D. Boyce,et al.  Using natural language processing to identify pharmacokinetic drug-drug interactions described in drug package inserts , 2012 .

[81]  Richard D. Smith,et al.  Binding MOAD, a high-quality protein–ligand database , 2007, Nucleic Acids Res..

[82]  Karin M. Verspoor,et al.  Detection of Protein Catalytic Sites in the Biomedical Literature , 2013, Pacific Symposium on Biocomputing.

[83]  Xiaowei Xu,et al.  Mining FDA drug labels using an unsupervised learning technique - topic modeling , 2011, BMC Bioinformatics.

[84]  David Carling,et al.  AMPK, insulin resistance, and the metabolic syndrome. , 2013, The Journal of clinical investigation.

[85]  Dietrich Rebholz-Schuhmann,et al.  Integrating protein-protein interactions and text mining for protein function prediction , 2008, BMC Bioinformatics.

[86]  Andreas Heinzel,et al.  Molecular models of the cardiorenal syndrome , 2013, Electrophoresis.

[87]  Michal Magid-Slav,et al.  Gastrointestinal weight-loss surgery: glimpses at the molecular level. , 2013, Drug discovery today.

[88]  Asa Ben-Hur,et al.  Hierarchical Classification of Gene Ontology Terms Using the Gostruct Method , 2010, J. Bioinform. Comput. Biol..

[89]  Nigam H. Shah,et al.  Practice-Based Evidence: Profiling the Safety of Cilostazol by Text-Mining of Clinical Notes , 2013, PloS one.

[90]  Zhiyong Lu,et al.  Author keywords in biomedical journal articles. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[91]  Christopher C. Yang,et al.  Social media mining for drug safety signal detection , 2012, SHB '12.

[92]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[93]  Nigam H. Shah,et al.  Using Temporal Patterns in Medical Records to Discern Adverse Drug Events from Indications , 2012, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[94]  K. Bretonnel Cohen,et al.  Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters , 2014, BMC Bioinformatics.

[95]  P. Sanseau,et al.  Computational Drug Repositioning: From Data to Therapeutics , 2013, Clinical pharmacology and therapeutics.

[96]  Eneko Agirre,et al.  Exploiting domain information for Word Sense Disambiguation of medical documents , 2011, J. Am. Medical Informatics Assoc..

[97]  Luca Toldo,et al.  Extraction of potential adverse drug events from medical case reports , 2012, Journal of biomedical semantics.

[98]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[99]  Karin M. Verspoor,et al.  Text Mining Improves Prediction of Protein Functional Sites , 2012, PloS one.

[100]  N. Yawalkar,et al.  Expression of interleukin-12 is increased in psoriatic skin. , 1998, The Journal of investigative dermatology.

[101]  Richard N Bergman,et al.  Gastrointestinal hormones and bariatric surgery‐induced weight loss , 2013, Obesity.

[102]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[103]  Karin M. Verspoor,et al.  A categorization approach to automated ontological function annotation , 2006, Protein science : a publication of the Protein Society.

[104]  Kristin Yiotis,et al.  The Open Access Initiative: A New Paradigm for Scholarly Communications , 2005 .

[105]  Gary D Bader,et al.  A travel guide to Cytoscape plugins , 2012, Nature Methods.

[106]  Pál Pacher,et al.  Modulating the endocannabinoid system in human health and disease – successes and failures , 2013, The FEBS journal.

[107]  P. Davis Public accessibility of biomedical articles from PubMed Central reduces journal readership—retrospective cohort analysis , 2013, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[108]  Chao Yang,et al.  Automatic Adverse Drug Events Detection Using Letters to the Editor , 2012, AMIA.

[109]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[110]  F. Anania,et al.  The Role of Gastrointestinal Hormones in Hepatic Lipid Metabolism , 2013, Seminars in Liver Disease.

[111]  C. Friedman,et al.  A drug-adverse event extraction algorithm to support pharmacovigilance knowledge mining from PubMed citations. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[112]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[113]  K. Bretonnel Cohen,et al.  Concept annotation in the CRAFT corpus , 2012, BMC Bioinformatics.

[114]  Judit Bar-Ilan,et al.  Which h-index? — A comparison of WoS, Scopus and Google Scholar , 2008, Scientometrics.

[115]  Russ B. Altman,et al.  Discovery and Explanation of Drug-Drug Interactions via Text Mining , 2011, Pacific Symposium on Biocomputing.

[116]  Leslie A. Walters,et al.  Lost in publication: Half of all renal practice evidence is published in non-renal journals. , 2006, Kidney international.

[117]  Alfonso Valencia,et al.  The Functional Genomics Network in the evolution of biological text mining over the past decade. , 2013, New biotechnology.

[118]  Alfonso Valencia,et al.  A sentence sliding window approach to extract protein annotations from biomedical articles , 2005, BMC Bioinformatics.

[119]  Chitta Baral,et al.  Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism , 2010, Bioinform..

[120]  S. Waldman,et al.  Central and Peripheral Molecular Targets for Antiobesity Pharmacotherapy , 2010, Clinical pharmacology and therapeutics.

[121]  K. Bretonnel Cohen,et al.  Manual curation is not sufficient for annotation of genomic databases , 2007, ISMB/ECCB.

[122]  Mark Craven,et al.  Learning Statistical Models for Annotating Proteins with Function Information using Biomedical Text , 2005, BMC Bioinformatics.

[123]  Andrea B Troxel,et al.  Increased risk of diabetes mellitus and likelihood of receiving diabetes mellitus treatment in patients with psoriasis. , 2012, Archives of dermatology.

[124]  Alfonso Valencia,et al.  Evaluation of BioCreAtIvE assessment of task 2 , 2005, BMC Bioinformatics.

[125]  Zhiyong Lu,et al.  Understanding PubMed® user search behavior through log analysis , 2009, Database J. Biol. Databases Curation.

[126]  Giampiero Girolomoni,et al.  Weight loss improves the response of obese patients with moderate-to-severe chronic plaque psoriasis to low-dose cyclosporine therapy: a randomized, controlled, investigator-blinded clinical trial. , 2008, The American journal of clinical nutrition.

[127]  Geoffrey J. Barton,et al.  GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes , 2004, BMC Bioinformatics.

[128]  U. Leser,et al.  Comprehensive Benchmark of Gene Ontology Concept Recognition tools , 2013 .

[129]  Tapio Salakoski,et al.  Drug-Drug Interaction Extraction from Biomedical Texts with SVM and RLS Classifiers , 2011 .

[130]  J. Prins,et al.  The IL-1 system in psoriatic skin: IL-1 antagonist sphere of influence in lesional psoriatic epidermis. , 1997, Journal of immunology.

[131]  Matthew E Falagas,et al.  Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses , 2007, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[132]  Yijia Zhang,et al.  A Single Kernel-Based Approach to Extract Drug-Drug Interactions from Biomedical Literature , 2012, PloS one.

[133]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[134]  P. Bork,et al.  A side effect resource to capture phenotypic effects of drugs , 2010, Molecular systems biology.

[135]  Joyce A. Mitchell,et al.  Gene Indexing: Characterization and Analysis of NLM's GeneRIFs , 2003, AMIA.

[136]  Philippe Sanseau,et al.  The role of translational bioinformatics in drug discovery. , 2011, Drug discovery today.