论文信息 - Machines first, humans second: on the importance of algorithmic interpretation of open chemistry data

Machines first, humans second: on the importance of algorithmic interpretation of open chemistry data

AbstractThe current rise in the use of open lab notebook techniques means that there are an increasing number of scientists who make chemical information freely and openly available to the entire community as a series of micropublications that are released shortly after the conclusion of each experiment. We propose that this trend be accompanied by a thorough examination of data sharing priorities. We argue that the most significant immediate benefactor of open data is in fact chemical algorithms, which are capable of absorbing vast quantities of data, and using it to present concise insights to working chemists, on a scale that could not be achieved by traditional publication methods. Making this goal practically achievable will require a paradigm shift in the way individual scientists translate their data into digital form, since most contemporary methods of data entry are designed for presentation to humans rather than consumption by machine learning algorithms. We discuss some of the complex issues involved in fixing current methods, as well as some of the immediate benefits that can be gained when open data is published correctly using unambiguous machine readable formats. Graphical AbstractLab notebook entries must target both visualisation by scientists and use by machine learning algorithms

Antony J. Williams | Alex M. Clark | Sean Ekins

[1] Clark Alex,et al. Living Molecules App to create Ingredients lists , 2013 .

[2] Eugene Vodopianov,et al. Automated structure verification based on a combination of 1D 1H NMR and 2D 1H13C HSQC spectra , 2007, Magnetic resonance in chemistry : MRC.

[3] Antony J. Williams,et al. ChemTrove: Enabling a Generic ELN To Support Chemistry through the Use of Transferable Plug-ins and Online Data Sources , 2015, J. Chem. Inf. Model..

[4] John Wilbanks,et al. Why Open Drug Discovery Needs Four Simple Rules for Licensing Data and Models , 2012, PLoS Comput. Biol..

[5] Antony J. Williams. ChemSpider: Integrating Structure-Based Resources Distributed across the Internet , 2010 .

[6] Ingrid Fischer,et al. Computational life sciences II , 2005 .

[7] Henry S. Rzepa,et al. Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks , 2014, J. Chem. Inf. Model..

[8] S. Bryant,et al. PubChem as a public resource for drug discovery. , 2010, Drug discovery today.

[9] Alex M. Clark,et al. The Open Drug Discovery Teams (ODDT) Mobile App For Green Chemistry , 2012 .

[10] Igor V. Filippov,et al. Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution , 2009, J. Chem. Inf. Model..

[11] Arthur Dalby,et al. Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..