Towards Computational Evaluation of Evidence for Scientific Assertions with Nanopublications

On the Web, it is possible for anyone to publish linked open data as RDF. Whilst this has huge potential to benefit data integration efforts, it highlights challenges of assessing data quality and trust. Nanopublication is an approach to data and knowledge publication in which assertions are individually encoded in RDF along with details about provenance, context and attribution. Collectively these details form a body of evidence for (or against) an assertion, which can be used as quality and trust criteria during data integration. In this position paper, we highlight the features of the Nanopublication specification that can be used as quality and trust criteria for life science data. We introduce the concept of cardinal assertions; assertions that are derived from the aggregation of multiple nanopublications to give an evidence value. We also identify a role for cardinal assertions in the evolution of evidence over time, supporting the re-evaluation of data and hypotheses.

[1]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[2]  Shane C. Burgess,et al.  Re-Annotation Is an Essential Step in Systems Biology Modeling of Functional Genomics Data , 2010, PloS one.

[3]  Carole A. Goble,et al.  Quality, trust, and utility of scientific data on the web: towards a joint model , 2011, WebSci '11.

[4]  Egon L. Willighagen,et al.  Linked open drug data for pharmaceutical research and development , 2011, J. Cheminformatics.

[5]  B. Mons,et al.  Nano-Publication in the e-science era , 2009 .

[6]  Judith A. Blake,et al.  Beyond the data deluge: Data integration and bio-ontologies , 2006, J. Biomed. Informatics.

[7]  Bin Chen,et al.  Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data , 2010, BMC Bioinformatics.

[8]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[9]  K. Dolinski,et al.  Use and misuse of the gene ontology annotations , 2008, Nature Reviews Genetics.

[10]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[11]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[12]  T. Buza,et al.  Gene Ontology annotation quality analysis in model eukaryotes , 2008, Nucleic acids research.

[13]  D. Kell,et al.  Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. , 2004, BioEssays : news and reviews in molecular, cellular and developmental biology.

[14]  Peer Bork,et al.  Comparison of computational methods for the identification of cell cycle-regulated genes , 2005, Bioinform..

[15]  Matthias E. Futschik,et al.  Are we Overestimating the Number of Cell-Cycling Genes? The Impact of Background Models , 2008, German Conference on Bioinformatics.

[16]  Rob W.W. Hooft,et al.  The value of data , 2011, Nature Genetics.

[17]  Bin Chen,et al.  Assessing Drug Target Association Using Semantic Linked Data , 2012, PLoS Comput. Biol..

[18]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[19]  Christophe Dessimoz,et al.  The what, where, how and why of gene ontology—a primer for bioinformaticians , 2011, Briefings Bioinform..

[20]  Phillip W. Lord,et al.  An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB , 2012, Bioinform..

[21]  Carole A. Goble,et al.  State of the nation in data integration for bioinformatics , 2008, J. Biomed. Informatics.

[22]  Amos Bairoch,et al.  neXtProt: a knowledge platform for human proteins , 2011, Nucleic Acids Res..

[23]  Christophe Dessimoz,et al.  Quality of Computationally Inferred Gene Ontology Annotations , 2012, PLoS Comput. Biol..