WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications

BackgroundIn recent years, there has been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery. However, there is a distinct lack of data mining tools available to harness this information, and in particular for knowledge discovery across multiple information sources. At Indiana University we have an ongoing project with Eli Lilly to develop web-service based tools for integrative mining of chemical and biological information. In this paper, we report on the first of these tools, called WENDI (Web Engine for Non-obvious Drug Information) that attempts to find non-obvious relationships between a query compound and scholarly publications, biological properties, genes and diseases using multiple information sources.ResultsWe have created an aggregate web service that takes a query compound as input, calls multiple web services for computation and database search, and returns an XML file that aggregates this information. We have also developed a client application that provides an easy-to-use interface to this web service. Both the service and client are publicly available.ConclusionsInitial testing indicates this tool is useful in identifying potential biological applications of compounds that are not obvious, and in identifying corroborating and conflicting information from multiple sources. We encourage feedback on the tool to help us refine it further. We are now developing further tools based on this model.

[1]  Robert Richards,et al.  Representational State Transfer (REST) , 2006 .

[2]  Egon L. Willighagen,et al.  Userscripts for the Life Sciences , 2007, BMC Bioinformatics.

[3]  Pierre Baldi,et al.  Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sublinear Time , 2007, J. Chem. Inf. Model..

[4]  B. Shneiderman Data type , 2003 .

[5]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[6]  Bin Chen,et al.  Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data , 2010, BMC Bioinformatics.

[7]  David J. Wild,et al.  Grand challenges for cheminformatics , 2009, J. Cheminformatics.

[8]  W. Graham Richards,et al.  Ultrafast shape recognition to search compound databases for similar molecular shapes , 2007, J. Comput. Chem..

[9]  Kei-Hoi Cheung,et al.  Linking Open Drug Data , 2009, I-SEMANTICS.

[10]  Junguk Hur,et al.  PubChemSR: A search and retrieval tool for PubChem , 2008, Chemistry Central journal.

[11]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[12]  Sean Ekins,et al.  Computer Applications in Pharmaceutical Research and Development , 2008 .

[13]  David J. Wild Strategies for Using Information Effectively in Early‐Stage Drug Discovery , 2006 .

[14]  Rajarshi Guha,et al.  Web Service Infrastructure for Chemoinformatics , 2007, J. Chem. Inf. Model..

[15]  Regina Dunlea,et al.  Simple Object Access Protocol (SOAP) , 2005 .

[16]  A M Vann,et al.  Dealing with Data Overload , 1993 .

[17]  Rajarshi Guha,et al.  Chemical Data Mining of the NCI Human Tumor Cell Line Database , 2007, J. Chem. Inf. Model..

[18]  R A Ford,et al.  Estimation of toxic hazard--a decision tree approach. , 1978, Food and cosmetics toxicology.

[19]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[20]  David J. Wild,et al.  An Automatic Drug Discovery Workflow Generation Tool Using Semantic Web Technologies , 2008, 2008 IEEE Fourth International Conference on eScience.