© The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. In recent years we have seen a renewed interest in Artificial Intelligence and Machine Learning in cheminformatics, and the idea of collecting, structuring and making use of Big Data in e.g. drug discovery has become a popular topic [1, 2]. Deep Learning methods are also making their way into cheminformatics and drug discovery [3, 4], further contributing to the increased attention. Data sets relevant for Machine Learning in cheminformatics are increasing in numbers and size, for example the ChEMBL database has grown from 2.4 million activity values in 2010 (ChEMBL version 02) to over 14 million activity values in 2017 (ChEMBL version 23) [5]. This has been propelled by the trend of organizations and companies depositing data sets in ChEMBL for public use. An important topic of Machine Learning is quantifying the uncertainty of the predictions produced by classification and regression models. Conformal Prediction is a methodology where predictors provide information about their own accuracy and reliability [6]. In contrast to traditional Machine Learning that delivers point estimates, Conformal Prediction yields a prediction region that contains the true value with probability equal to or higher than a predefined level of confidence. Such a prediction region can be obtained under the assumption that the observed data is exchangeable. Conformal Prediction has been demonstrated in cheminformatics [7], with the attractive property that it offers a compelling alternative to the topic of applicability domain determination [8]. Using Conformal Prediction, the size of the prediction region will be larger if the compound is ‘non-conforming’ to the training set. This article collection in Journal of Cheminformatics features three articles on the topic of applications of Conformal Prediction and deep learning. Larger datasets and demanding methods such as Deep Learning necessitates high-performance e-infrastructures. Ahmed et al. [9] present an iterative Conformal Prediction approach for virtual screening implemented in Apache Spark on cloud computing resources, and show how the number of docked compounds can be reduced significantly with a Machine Learning augmented approach compared to traditional dock-all strategies. Svensson et al. [10] uses Conformal Prediction to predict what strategy generates the highest gain in a highthroughput screening setting. The authors show that by learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models, resulting in more efficient screening. De la Vega de León et al. [11] provide insights into how missing data affect multitask prediction methods, using Deep Learning and Bayesian probabilistic matrix factorization. This collection in Journal of Cheminformatics includes a set of extended versions of the top ranking papers presented in the 6th Symposium on Conformal and Probabilistic Prediction with Applications (COPA 2017) at Karolinska Institutet, Stockholm, Sweden on June 14–16, 2017. Further, the collection was open for contribution from other authors. All papers went through a regular reviewing process and were properly revised, if necessary, prior to acceptance.
[1]
George Papadatos,et al.
The ChEMBL database in 2017
,
2016,
Nucleic Acids Res..
[2]
Thomas Blaschke,et al.
The rise of deep learning in drug discovery.
,
2018,
Drug discovery today.
[3]
Ulf Norinder,et al.
Predicting skin sensitizers with confidence - Using conformal prediction to determine applicability domain of GARD.
,
2018,
Toxicology in vitro : an international journal published in association with BIBRA.
[4]
Andrew G. Leach,et al.
Can we accelerate medicinal chemistry by augmenting the chemist with Big Data and artificial intelligence?
,
2018,
Drug discovery today.
[5]
Sean Ekins.
The Next Era: Deep Learning in Pharmaceutical Research
,
2016,
Pharmaceutical Research.
[6]
Ola Spjuth,et al.
Efficient iterative virtual screening with Apache Spark and conformal prediction
,
2018,
Journal of Cheminformatics.
[7]
Valerie J Gillet,et al.
Effect of missing data on multitask prediction methods
,
2018,
Journal of Cheminformatics.
[8]
Jürgen Bajorath,et al.
Drug discovery and development in the era of Big Data.
,
2016,
Future medicinal chemistry.
[9]
Vladimir Vapnik,et al.
Statistical learning theory
,
1998
.
[10]
Andreas Bender,et al.
Maximizing gain in high-throughput screening using conformal prediction
,
2018,
Journal of Cheminformatics.
[11]
Scott Boyer,et al.
Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination
,
2014,
J. Chem. Inf. Model..