Using Papers Citations for Selecting the Best Genomic Databases

Selecting the right data is an essential activity in Genomic-related Information Systems. This work aims to analyze if it is possible to select the best genomic databases from a catalog using information about papers citations related to these genomic databases. The motivation for using information about citations has to do with the fact that it is not easy to obtain proper metadata with respect to these databases. Thus, in this work, information related to papers citations is used for measuring three distinct data quality dimensions: believability, timeliness, and relevancy. Believability is evaluated through the inspection of the number of citations. The variation of the number of citations over time is useful for determining the recency of a database and it is related to the timeliness dimension. Regarding to relevancy, the keywords of papers are useful to indicate the main context of application of these databases.

[1]  Susan B. Davidson,et al.  BioGuideSRS: querying multiple sources with a user-centric perspective , 2007, Bioinform..

[2]  Felix Naumann,et al.  Do Metadata Models meet IQ Requirements? , 1999, IQ.

[3]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[4]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[5]  Christian Bizer,et al.  Quality-Driven Information Filtering- In the Context of Web-Based Information Systems , 2007 .

[6]  Erhard Rahm,et al.  Comparison of Schema Matching Evaluations , 2002, Web, Web-Services, and Database Systems.

[7]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[8]  Michael Zouberakis,et al.  Finding and sharing: new approaches to registries of databases and services for the biomedical sciences , 2010, Database J. Biol. Databases Curation.

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Felix Naumann,et al.  Quality-Driven Query Answering for Integrated Information Systems , 2002, Lecture Notes in Computer Science.

[11]  Felix Naumann,et al.  Quality-driven Integration of Heterogenous Information Systems , 1999, VLDB.

[12]  Richard Y. Wang,et al.  Modeling Information Manufacturing Systems to Determine Information Product Quality Management Scien , 1998 .

[13]  Peer Kröger,et al.  A Computational Biology Database Digest: Data, Data Analysis, and Data Management , 2004, Distributed and Parallel Databases.

[14]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[15]  Tin Wee Tan,et al.  Towards BioDBcore: a community-defined information specification for biological databases , 2010, Database J. Biol. Databases Curation.

[16]  Michael Y. Galperin,et al.  The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources , 2009, Nucleic Acids Res..

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Stephen E. Robertson,et al.  On the history of evaluation in IR , 2008, J. Inf. Sci..

[19]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[21]  James A. Hendler,et al.  Accuracy of Metrics for Inferring Trust and Reputation in Semantic Web-Based Social Networks , 2004, EKAW.

[22]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[23]  Peer Kr A Computational Biology Database Digest: Data, Data Analysis, and Data Management , 2003 .

[24]  Subbarao Kambhampati,et al.  SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement , 2011, WWW.

[25]  Amedeo Napoli,et al.  BioRegistry: Automatic extraction of metadata for biological database retrieval and discovery , 2010, Int. J. Metadata Semant. Ontologies.

[26]  Loren G. Terveen,et al.  Does “authority” mean quality? predicting expert quality ratings of Web documents , 2000, SIGIR '00.

[27]  Shazia Wasim Sadiq,et al.  Data Quality in Web Information Systems , 2008, WISE.

[28]  Juliana Freire,et al.  Automatically Constructing a Directory of Molecular Biology Databases , 2007, DILS.