Integrated Bio-Search: challenges and trends for the integration, search and comprehensive processing of biological information

Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context.First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered.In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed.

[1]  Carole A. Goble,et al.  BioCatalogue: a universal catalogue of web services for the life sciences , 2010, Nucleic Acids Res..

[2]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[3]  Steve Pettifer,et al.  BioXSD: the common data-exchange format for everyday bioinformatics web services , 2010, Bioinform..

[4]  Michael Y. Galperin,et al.  The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection , 2011, Nucleic Acids Res..

[5]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[6]  Luca de Alfaro,et al.  The Gene Wiki in 2011: community intelligence applied to human gene annotation , 2011, Nucleic Acids Res..

[7]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[8]  Terri K. Attwood,et al.  The EMBRACE web service collection , 2010, Nucleic Acids Res..

[9]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2013 , 2012, Nucleic Acids Res..

[10]  Johannes Griss,et al.  The Proteomics Identifications (PRIDE) database and associated tools: status in 2013 , 2012, Nucleic Acids Res..

[11]  Marco Masseroli,et al.  Preface - NETTAB 2012 Workshop on “Integrated Bio-Search” , 2012 .

[12]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2011 , 2011, Nucleic Acids Res..

[13]  Barend Mons,et al.  Which gene did you mean? , 2005, BMC Bioinformatics.

[14]  H. Stehouwer Research Data Alliance: Research Data Sharing without Barriers , 2015 .

[15]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[16]  J. Choi,et al.  ENCODE: A Sourcebook of Epigenomes and Chromatin Language , 2013, Genomics & informatics.

[17]  Alex Bateman,et al.  Curators of the world unite: the International Society of Biocuration , 2010, Bioinform..

[18]  Chris T. A. Evelo,et al.  WikiPathways: building research communities on biological pathways , 2011, Nucleic Acids Res..

[19]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[20]  Yolanda Gil Interactive knowledge capture in the new millennium: how the Semantic Web changed everything , 2011, Knowl. Eng. Rev..

[21]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[22]  Benjamin M. Good,et al.  Fast, Cheap and Out of Control: A Zero Curation Model for Ontology Development , 2005, Pacific Symposium on Biocomputing.

[23]  B. Mons,et al.  Nano-Publication in the e-science era , 2009 .

[24]  Michael Y. Galperin,et al.  The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection , 2011, Nucleic Acids Res..

[25]  Hagen Blankenburg,et al.  Integrating biological data – the Distributed Annotation System , 2008, BMC Bioinformatics.

[26]  Anthony J. Brookes,et al.  Semantically enabling a genome-wide association study database , 2012, Journal of Biomedical Semantics.

[27]  M. Ashburner,et al.  Calling on a million minds for community annotation in WikiProteins , 2008, Genome Biology.

[28]  Sampsa Hautaniemi,et al.  Genomic Region Operation Kit for Flexible Processing of Deep Sequencing Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Barend Mons,et al.  In silico discovery and experimental validation of new protein–protein interactions , 2011, Proteomics.

[30]  Sungsam Gong,et al.  MetaBase—the wiki-database of biological databases , 2011, Nucleic Acids Res..

[31]  Benjamin M. Good,et al.  Strategies for amassing, characterizing, and applying third-party metadata in bioinformatics , 2009 .

[32]  A. Bairoch,et al.  neXtProt: organizing protein knowledge in the context of human proteome projects. , 2013, Journal of proteome research.

[33]  G. Nolan,et al.  Computational solutions to large-scale data management and analysis , 2010, Nature Reviews Genetics.

[34]  D. Cooper,et al.  Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain , 2012, Human mutation.