Advancing computational biology and bioinformatics research through open innovation competitions

Open data science and algorithm development competitions offer a unique avenue for rapid discovery of better computational strategies. We highlight three examples in computational biology and bioinformatics research where the use of competitions has yielded significant performance gains over established algorithms. These include algorithms for antibody clustering, imputing gene expression data, and querying the Connectivity Map (CMap). Performance gains are evaluated quantitatively using realistic, albeit sanitized, data sets. The solutions produced through these competitions are then examined with respect to their utility and the prospects for implementation in the field. We present the decision process and competition design considerations that lead to these successful outcomes as a model for researchers who want to use competitions and non-domain crowds as collaborators to further their research.

[1]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[2]  Vladimir M. Pentkovski,et al.  Implementing Streaming SIMD Extensions on the Pentium III Processor , 2000, IEEE Micro.

[3]  William D. Lees,et al.  Studying Antibody Repertoires with Next-Generation Sequencing. , 2017, Methods in molecular biology.

[4]  John R Mascola,et al.  Antibody responses to envelope glycoproteins in HIV-1 infection , 2015, Nature Immunology.

[5]  Trevor Hastie,et al.  Imputing Missing Data for Gene Expression Arrays , 2001 .

[6]  Nci Dream Community A community effort to assess and improve drug sensitivity prediction algorithms , 2014 .

[7]  Jill P. Mesirov,et al.  Comparative gene marker selection suite , 2006, Bioinform..

[8]  Karim R. Lakhani,et al.  Marginality and Problem-Solving Effectiveness in Broadcast Search , 2010, Organ. Sci..

[9]  Eric Lonstein,et al.  Prize-based contests can provide solutions to computational biology problems , 2013, Nature Biotechnology.

[10]  S. Friend,et al.  Crowdsourcing biomedical research: leveraging communities as innovation engines , 2016, Nature Reviews Genetics.

[11]  Michael G. Endres,et al.  Use of Crowd Innovation to Develop an Artificial Intelligence–Based Solution for Radiation Therapy Targeting , 2019, JAMA oncology.

[12]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  David P Bauer,et al.  Quanti.us: a tool for rapid, flexible, crowd-based annotation of images , 2018, Nature Methods.

[14]  Laura M. Heiser,et al.  A community effort to assess and improve drug sensitivity prediction algorithms , 2014, Nature Biotechnology.

[15]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[16]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[17]  Pascale Mathonet,et al.  The Application of Next Generation Sequencing to the Understanding of Antibody Repertoires , 2013, Front. Immunol..

[18]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[19]  Deepak K Rajpal,et al.  Applications of Connectivity Map in drug discovery and development. , 2012, Drug discovery today.

[20]  Evan O. Paull,et al.  Inferring causal molecular networks: empirical assessment through a community-based effort , 2016, Nature Methods.

[21]  Angela N. Brooks,et al.  A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles , 2017, Cell.