Open PHACTS: A Semantic Knowledge Infrastructure for Public and Commercial Drug Discovery Research

Technology advances in the last decade have led to a "digital revolution" in biomedical research. Much greater volumes of data can be generated in much less time, transforming the way researchers work [1]. Yet, for those seeking to develop new drugs to treat human disease, the task of assembling a coherent picture of existing knowledge from molecular biology to clinical investigation, can be daunting and frustrating. Individual electronic resources remain mostly disconnected, making it difficult to follow information between them. Those that contain similar types of data can describe them very differently, compounding the confusion. It can also be difficult to understand exactly where specific facts or data points originated or how to judge their quality or reliability. Finally, scientists routinely wish to ask questions that the system does not allow, or ask questions that span multiple different resources. Often the result of this is to simply abandon the enquiry, significantly diminishing the value to be gained from existing knowledge. Within pharmaceutical companies, such concerns have led to majorprogrammes in data integration; downloading, parsing, mapping, transforming and presenting public, commercial and private data. Much of this work is redundant between companies and significant resources could be saved by collaboration [2]. In an industry facing major economic pressures [3], the idea of combining forces to "get more for less" is very attractive and is arguably the only feasible route to dealing with the exponentially growing information landscape.

[1]  Bin Chen,et al.  Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data , 2010, BMC Bioinformatics.

[2]  J. Scannell,et al.  Diagnosing the decline in pharmaceutical R&D efficiency , 2012, Nature Reviews Drug Discovery.

[3]  Dieter Fensel,et al.  Towards LarKC: A Platform for Web-Scale Reasoning , 2008, 2008 IEEE International Conference on Semantic Computing.

[4]  Susanna-Assunta Sansone,et al.  Empowering industrial research with shared biomedical vocabularies. , 2011, Drug discovery today.

[5]  Mark Foster,et al.  Open source software in life science research , 2012 .

[6]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[7]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[8]  Lee Harland,et al.  Lowering industry firewalls: pre-competitive informatics initiatives in drug discovery , 2009, Nature Reviews Drug Discovery.

[9]  Qian Zhu,et al.  Semantic inference using chemogenomics data for drug discovery , 2011, BMC Bioinformatics.

[10]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[11]  Chris T. A. Evelo,et al.  The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services , 2010, BMC Bioinformatics.

[12]  Rob W.W. Hooft,et al.  The value of data , 2011, Nature Genetics.

[13]  D. Kell,et al.  Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. , 2004, BioEssays : news and reviews in molecular, cellular and developmental biology.