In science, the relationship between methods and discovery is symbiotic. As we discover more, we are able to construct more precise and sensitive tools and methods that enable further discovery. With better lens crafting came microscopes, and with them the discovery of living cells. In the last 40 years, advances in molecular biology, statistics, and computer science have ushered in the field of bioinformatics and the genomic era.
Computational scientists enjoy developing new methods, and the community encourages them to do so. Indeed, the editorial guidelines for PLOS Computational Biology require manuscripts to apply novel methods. However, it is often confusing to know which method to choose: which method is best? And, in this context, what does “best” mean?
To help choose an appropriate method for a particular task, scientists often form community-based challenges for the unbiased evaluation of methods in a given field. These challenges help evaluate existing and novel methods, while helping to coalesce a community and leading to new ideas and collaborations.
In computational biology, the first of these challenges was arguably the Critical Assessment of protein Structure Prediction, or CASP [1], whose goal is to evaluate methods for predicting three-dimensional protein structure from amino acid sequence. The first CASP meeting was held in December of 1994, following a “prediction period” where members of the community were presented with protein amino acid sequences and asked to predict their three dimensional structures. The sequences that were chosen had recently been solved by X-ray crystallography but had not been not published or released until after the predictions from the community were made. Since the first CASP, we have seen many successful challenges, including Critical Assessment of Function Annotation (CAFA) for protein function prediction [2], Critical Assessment of Genome Interpretation (CAGI) (for genome interpretation) [3], Critical Assessment of Massive (originally “Microarray”) Data Analysis (CAMDA) (for large-scale biological data) [4], BioCreative (for biomedical text mining) [5], the Assemblathon (for sequence assembly), and the NCI-DREAM Challenges (for various biomedical challenges), amongst others [6].
Computational challenges also help solve new problems. While the original CASP experiment was developed to evaluate existing methods applied to current problems, other communities often look at other areas for which there are no existing tools. These challenges have spread successfully to industry, and companies such as Innocentive [7] and X-Prize [8] offer large prizes for solving novel questions.
Because these challenges are, on one hand, an exercise in community collaboration, and on the other, a competition, organizing a challenge is littered with difficulties and pitfalls. Having served as organizers, predictors, and assessors within several existing communities, we present ten rules we believe should be observed when organizing a computational methods challenge:
[1]
J C Costello,et al.
Seeking the Wisdom of Crowds Through Challenge‐Based Competitions in Biomedical Research
,
2013,
Clinical pharmacology and therapeutics.
[2]
Daniel W. A. Buchan,et al.
A large-scale evaluation of computational protein function prediction
,
2013,
Nature Methods.
[3]
Philip E. Bourne,et al.
Ten Simple Rules for Organizing a Scientific Meeting
,
2008,
PLoS Comput. Biol..
[4]
Alfonso Valencia,et al.
Overview of BioCreAtIvE: critical assessment of information extraction for biology
,
2005,
BMC Bioinformatics.
[5]
Kimberly F. Johnson,et al.
Call to work together on microarray data analysis
,
2001,
Nature.
[6]
K Fidelis,et al.
A large‐scale experiment to assess protein structure prediction methods
,
1995,
Proteins.