Text mining patents for biomedical knowledge.

Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery.

[1]  Sorel Muresan,et al.  The Cinderella of Biological Data Integration: Addressing Some of the Challenges of Entity and Relationship Mining from Patent Sources , 2010, DILS.

[2]  Doron Lancet,et al.  Mapping of molecular pathways, biomarkers and drug targets for diabetic nephropathy , 2011, Proteomics. Clinical applications.

[3]  Roger A. Sayle Foreign Language Translation of Chemical Nomenclature by Computer , 2009, J. Chem. Inf. Model..

[4]  Daniel M. Lowe,et al.  Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists' Bread and Butter. , 2016, Journal of medicinal chemistry.

[5]  Michael Schroeder,et al.  Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed , 2013, Journal of Biomedical Semantics.

[6]  Raul Rodriguez-Esteban,et al.  Visualizing evolution and impact of biomedical fields , 2008, J. Biomed. Informatics.

[7]  Alfonso Valencia,et al.  CHEMDNER: The drugs and chemical names extraction challenge , 2015, Journal of Cheminformatics.

[8]  Sougata Mukherjea,et al.  BioPatentMiner: An Information Retrieval System for BioMedical Patents , 2004, VLDB.

[9]  János Csirik,et al.  A Manually Annotated Corpus of Pharmaceutical Patents , 2012, TSD.

[10]  Cyril Grouin Biomedical entity extraction using machine-learning based approaches , 2014, LREC.

[11]  Catia Pesquita,et al.  Chemical Entity Recognition and Resolution to ChEBI , 2012, ISRN bioinformatics.

[12]  Michael F. Lynch,et al.  Extraction of Information from the Text of Chemical Patents. 1. Identification of Specific Chemical Names , 1998, J. Chem. Inf. Comput. Sci..

[13]  Sara Reardon,et al.  Text-mining offers clues to success , 2014, Nature.

[14]  Lijun Zhu,et al.  Chemical and Biological Entity Recognition System from Patent Documents , 2015 .

[15]  Ulrich Schmoch,et al.  Indicators and the relations between science and technology , 2006, Scientometrics.

[16]  Francisco M. Couto,et al.  Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation , 2013, PloS one.

[17]  Shiaofen Fang,et al.  Text mining for bone biology , 2010, HPDC '10.

[18]  Robert Stevens,et al.  Sealife: A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases , 2006, HealthGrid.

[19]  Martin Hofmann-Apitius,et al.  Abstracts versus Full Texts and Patents: A Quantitative Analysis of Biomedical Entities , 2010, IRFC.

[20]  Sérgio VA Campos,et al.  Can the vector space model be used to identify biological entity activities? , 2011, BMC Genomics.

[21]  Khaled Khelif,et al.  Supporting Patent Mining by using Ontology-based Semantic Annotations , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[22]  A. Peter Johnson,et al.  CLiDE Pro: The Latest Generation of CLiDE, a Tool for Optical Chemical Structure Recognition , 2009, J. Chem. Inf. Model..

[23]  Peter Murray-Rust,et al.  Mining chemical information from open patents , 2011, J. Cheminformatics.

[24]  Peter Murray-Rust,et al.  Chemical Name to Structure: OPSIN, an Open Source Solution , 2011, J. Chem. Inf. Model..

[25]  Sorel Muresan,et al.  Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds , 2009, J. Cheminformatics.

[26]  Hans-Peter Kriegel,et al.  Extraction of semantic biomedical relations from text using conditional random fields , 2008, BMC Bioinformatics.

[27]  Ying Chen,et al.  ChemBrowser: a flexible framework for mining chemical documents. , 2010, Advances in experimental medicine and biology.

[28]  Roger A. Sayle,et al.  Improved Chemical Text Mining of Patents with Infinite Dictionaries and Automatic Spelling Correction , 2012, J. Chem. Inf. Model..

[29]  Ying Chen,et al.  Mining Patents Using Molecular Similarity Search , 2006, Pacific Symposium on Biocomputing.

[30]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[31]  Xiangji Huang,et al.  Overview of the TREC 2011 Chemical IR Track , 2009, TREC.

[32]  Wipo World Intellectual Property Indicators, 2017 edition , 2017 .

[33]  Suzan Verberne,et al.  Quantifying the Challenges in Parsing Patent Claims , 2010, PaIR 2010.

[34]  Thomas D. Griffin,et al.  Annotating patents with Medline MeSH codes via citation mapping. , 2010, Advances in experimental medicine and biology.

[35]  Dietrich Rebholz-Schuhmann,et al.  Identification of Chemical Entities in Patent Documents , 2009, IWANN.

[36]  P. Lundin Is silence still golden? Mapping the RNAi patent landscape , 2011, Nature Biotechnology.

[37]  C. Tardy The role of English in scientific communication: Lingua franca or Tyrannosaurus rex? , 2004 .

[38]  Gobinda G. Chowdhury,et al.  Automatic interpretation of the texts of chemical patent abstracts. 1. Lexical analysis and categorization , 1992, J. Chem. Inf. Comput. Sci..

[39]  Luciana B Sollaci,et al.  The introduction, methods, results, and discussion (IMRAD) structure: a fifty-year survey. , 2004, Journal of the Medical Library Association : JMLA.

[40]  David B. Searls,et al.  Can literature analysis identify innovation drivers in drug discovery? , 2009, Nature Reviews Drug Discovery.

[41]  Thomas Klose,et al.  Leveraging text analytics in patent analysis to empower business decisions – A competitive differentiation of kinase assay technology platforms by I2E text mining software , 2014 .

[42]  Paloma Martínez,et al.  Lessons learnt from the DDIExtraction-2013 Shared Task , 2014, J. Biomed. Informatics.

[43]  Dietrich Rebholz-Schuhmann,et al.  A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC , 2015, J. Am. Medical Informatics Assoc..

[44]  Gregory D. Graff,et al.  The global stem cell patent landscape: implications for efficient technology transfer and commercial development , 2007, Nature Biotechnology.

[45]  Sandra Bergmann,et al.  Information Extraction from Chemical patents , 2012, Comput. Sci..

[46]  Matthias Irmer,et al.  Creating a Gold Standard Corpus for the Extraction of Chemistry-Disease Relations from Patent Texts , 2014, LREC.

[47]  Raul Rodriguez-Esteban,et al.  Biomedical Text Mining and Its Applications , 2009, PLoS Comput. Biol..

[48]  K. Bretonnel Cohen,et al.  The structural and content aspects of abstracts versus bodies of full text journal articles are different , 2010, BMC Bioinformatics.

[49]  Jonas Boström,et al.  Exploiting Structural Information in Patent Specifications for Key Compound Prediction , 2012, J. Chem. Inf. Model..

[50]  George Papadatos,et al.  SureChEMBL: a large-scale, chemically annotated patent document database , 2015, Nucleic Acids Res..

[51]  George Papadatos,et al.  Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents , 2015, Journal of Cheminformatics.

[52]  Daniel M. Lowe,et al.  Annotated Chemical Patent Corpus: A Gold Standard for Text Mining , 2014, PloS one.

[53]  Peter Murray-Rust,et al.  ChemicalTagger: A tool for semantic text-mining in chemistry , 2011, J. Cheminformatics.

[54]  Thérèse Vachon,et al.  Development and tuning of an original search engine for patent libraries in medicinal chemistry , 2014, BMC Bioinformatics.

[55]  K. Bretonnel Cohen,et al.  Mining the pharmacogenomics literature - a survey of the state of the art , 2012, Briefings Bioinform..

[56]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[57]  Dietrich Rebholz-Schuhmann,et al.  Entity Recognition in Parallel Multi-lingual Biomedical Corpora: The CLEF-ER Laboratory Overview , 2013, CLEF.

[58]  Sougata Mukherjea,et al.  Information retrieval and knowledge discovery utilizing a biomedical patent semantic Web , 2005, IEEE Transactions on Knowledge and Data Engineering.

[59]  Barry Robson,et al.  Drug discovery using very large numbers of patents. General strategy with extensive use of match and edit operations , 2011, J. Comput. Aided Mol. Des..

[60]  Sorel Muresan,et al.  Comparing manual and automated extraction of chemical entities from documents , 2010, J. Cheminformatics.

[61]  Michael Schroeder,et al.  Inter-species normalization of gene mentions with GNAT , 2008, ECCB.

[62]  Martin Hofmann-Apitius,et al.  Detection of IUPAC and IUPAC-like chemical names , 2008, ISMB.