Semantic Web Platforms for Bioinformatics and Life Sciences

The amount of data being generated in the life sciences has increased exponentially in the past few years, with DNA sequencing beating Moore’s law since 2008. New challenges are thus being posed for data integration and analysis in order to cope with this massive amount of information. We review some of the most promising platforms that are leveraging the Semantic Web approach, a powerful paradigm that has the potential to address many of the issues being faced in bioinformatics. In doing so, we introduce the field by evaluating ontologies and middleware, highlighting present and future trends. In the last couple of years the interest in Big Data and NoSQL technologies have outshined the vision of the Semantic Web; here we advocate the need of merging some of the technologies in order to leverage both paradigms. The time is ripe for industry-driven and research-driven architectures to come together in order to deliver usable tools in several interdisciplinary fields. The scientific discovery process has shifted from the traditional approach of formulating hypothesis, doing experiments and interpreting them in a cycle, towards a more complex and data-driven workflow as many fields have changed dramatically, thanks to the heavy use of computers. Life sciences in particular have become more and more information and data centric in the last decade, most notably thanks to the availability of new sequencing and measurements techniques. Dealing with this vast amount of information requires splitting the traditional interpretation task into several steps, leaning towards a data-driven methodology comprising of Data Management, Analysis and Mining. This approach encompasses several skills and requires contributions from different expertise in order to properly formulate the experiments, analyze the results and drive new insights and conclusions. As more and more experiments in the field of biology and life sciences become more and more high throughput, expressing data in ways that can be read by computers and ways that can be shared from one experiment to another and from one data source to another is thus becoming increasingly important. The Semantic Web (Berners-Lee, 2001) is a technology stack backed by the World Wide Web Consortium (W3C) that tackles this problem; ontologies that facilitate semantic search and information integration are a fundamental part of this stack. Despite the dynamicity of biological information has limited the development of ontologies to support dynamic reasoning for knowledge discovery, we advocate that the time is right for bio-ontologies to be developed and exploited at their full potential. Here we aim at introducing the field and some use cases of bio-ontologies.