Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases
暂无分享,去创建一个
Qingyu Chen | Justin Zobel | Arthur Liberzon | Michele Magrane | Ivan Erill | Karin Verspoor | Marc Robinson-Rechavi | Jun-ichi Onami | Ramona Britto | Jana Sponarova | Constance J. Jeffery | K. Verspoor | M. Magrane | J. Zobel | A. Liberzon | Qingyu Chen | R. Britto | Ivan Erill | M. Robinson-Rechavi | J. Onami | Constance J. Jeffery | Jana Sponarova | Jun-ichi Onami
[1] Oren Etzioni,et al. The elephant in the room: getting value from Big Data , 2015, WebDB.
[2] David S. Goodsell,et al. The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..
[3] Michele Magrane,et al. UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.
[4] Zhengwei Zhu,et al. CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..
[5] S. Persson,et al. Identification of Clinical Aeromonas Species by rpoB and gyrB Sequencing and Development of a Multiplex PCR Method for Detection of Aeromonas hydrophila, A. caviae, A. veronii, and A. media , 2014, Journal of Clinical Microbiology.
[6] Michael Y. Galperin,et al. The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes , 2017, Nucleic acids research.
[7] A. Krogh,et al. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.
[8] Sébastien Moretti,et al. Bgee: Integrating and Comparing Heterogeneous Transcriptome Data Among Species , 2008, DILS.
[9] M. Gillespie,et al. Guidelines for the functional annotation of microRNAs using the Gene Ontology , 2016, RNA.
[10] Jürg Bähler,et al. PomBase 2015: updates to the fission yeast database , 2014, Nucleic Acids Res..
[11] Mohamed A. Sharaf,et al. A Framework for Data Quality Aware Query Systems , 2011, DASFAA Workshops.
[12] E Pennisi,et al. Keeping Genome Databases Clean and Up to Date , 1999, Science.
[13] Carlo Batini,et al. Data and Information Quality , 2016, Data-Centric Systems and Applications.
[14] Qian Li,et al. Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model , 2016, Database J. Biol. Databases Curation.
[15] Marcus C. Chibucos,et al. The Confidence Information Ontology: a step towards a standard for asserting confidence in annotations , 2015, Database J. Biol. Databases Curation.
[16] Cassie S. Mitchell,et al. Undergraduate Biocuration: Developing Tomorrow's Researchers While Mining Today's Data. , 2015, Journal of undergraduate neuroscience education : JUNE : a publication of FUN, Faculty for Undergraduate Neuroscience.
[17] Karin M. Verspoor,et al. Literature consistency of bioinformatics sequence databases is effective for assessing record quality , 2017, bioRxiv.
[18] Bei Wu,et al. Investigation on the Association of Soil Microbial Populations with Ecological and Environmental Factors in the Pearl River Estuary , 2018 .
[19] Min Song,et al. Detecting duplicate biological entities using Markov random field-based edit distance , 2009, Knowledge and Information Systems.
[20] Philip E. Bourne,et al. Biocurators: Contributors to the World of Science , 2006, PLoS Comput. Biol..
[21] Kristof Coussement,et al. Data Accuracy's Impact on Segmentation Performance: Benchmarking RFM Analysis, Logistic Regression, and Decision Trees , 2012 .
[22] Pascale Gaudet,et al. Best Practices in Manual Annotation with the Gene Ontology. , 2017, Methods in molecular biology.
[23] Cathy H. Wu,et al. UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..
[24] A Bairoch,et al. SWISS-PROT: connecting biomolecular knowledge via a protein database. , 2001, Current issues in molecular biology.
[25] Amos Bairoch,et al. The Sulfinator: predicting tyrosine sulfation sites in protein sequences , 2002, Bioinform..
[26] Jon R Lorsch,et al. Perspective: Sustaining the big-data ecosystem , 2015, Nature.
[27] The Gene Ontology Consortium,et al. Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..
[28] Oliver Horlacher,et al. The SIB Swiss Institute of Bioinformatics’ resources: focus on curated databases , 2015, Nucleic Acids Res..
[29] Rachael P. Huntley,et al. Standardized description of scientific evidence using the Evidence Ontology (ECO) , 2014, Database J. Biol. Databases Curation.
[30] M S Waterman,et al. Genomic sequence databases. , 1990, Genomics.
[31] Veda C. Storey,et al. A Framework for Analysis of Data Quality Research , 1995, IEEE Trans. Knowl. Data Eng..
[32] Narmada Thanki,et al. CDD: NCBI's conserved domain database , 2014, Nucleic Acids Res..
[33] Paul T. J. Tan,et al. Duplicate Detection in Biological Data using Association Rule Mining , 2004 .
[34] Hans-Michael Müller,et al. Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.
[35] Karin M. Verspoor,et al. Comparative Analysis of Sequence Clustering Methods for Deduplication of Biological Databases , 2018, ACM J. Data Inf. Qual..
[36] Claire O'Donovan,et al. Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data , 2014, Database J. Biol. Databases Curation.
[37] S. Brunak,et al. Cleaning the GenBank Arabidopsis thaliana data set. , 1996, Nucleic acids research.
[38] Karin M. Verspoor,et al. Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study , 2016, bioRxiv.
[39] Jocelyn Kaiser,et al. BIOMEDICAL RESOURCES. Funding for key data resources in jeopardy. , 2016, Science.
[40] Ivan Erill,et al. CollecTF: a database of experimentally validated transcription factor-binding sites in Bacteria , 2013, Nucleic Acids Res..
[41] Evan Bolton,et al. Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..
[42] D. Higgins,et al. T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.
[43] Rex L. Chisholm,et al. dictyBase 2013: integrating multiple Dictyostelid species , 2012, Nucleic Acids Res..
[44] S. Brunak,et al. Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. , 2005, Glycobiology.
[45] Diane M. Strong,et al. Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..
[46] S. Brunak,et al. Locating proteins in the cell using TargetP, SignalP and related tools , 2007, Nature Protocols.
[47] Zi Huang,et al. Near-duplicate video retrieval: Current research and future trends , 2013, CSUR.
[48] Silvio C. E. Tosatto,et al. InterPro in 2017—beyond protein family and domain annotations , 2016, Nucleic Acids Res..
[49] Patricia C. Babbitt,et al. Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..
[50] Sandra Orchard,et al. Data standardization and sharing-the work of the HUPO-PSI. , 2014, Biochimica et biophysica acta.
[51] Kara Dolinski,et al. The BioGRID interaction database: 2017 update , 2016, Nucleic Acids Res..
[52] Amos Bairoch,et al. The neXtProt knowledgebase on human proteins: 2017 update , 2016, Nucleic Acids Res..
[53] Robert D. Finn,et al. The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..
[54] Peter D. Karp,et al. The comprehensive updated regulatory network of Escherichia coli K-12 , 2006, BMC Bioinformatics.
[55] Elisabeth Coudert,et al. HAMAP in 2015: updates to the protein family classification and annotation system , 2014, Nucleic Acids Res..
[56] Karin M. Verspoor,et al. Coreference resolution improves extraction of Biological Expression Language statements from texts , 2016, Database J. Biol. Databases Curation.
[57] Erika Check Hayden,et al. Funding for model-organism databases in trouble , 2016 .
[58] Brijesh Sharma,et al. Molecular characterization and in vitro antifungal susceptibility of 80 clinical isolates of mucormycetes in Delhi, India , 2014, Mycoses.
[59] K. Bretonnel Cohen,et al. Text mining for the biocuration workflow , 2012, Database J. Biol. Databases Curation.
[60] C. Ponting,et al. Homology-based method for identification of protein repeats using statistical significance estimates. , 2000, Journal of molecular biology.
[61] Claire O'Donovan,et al. Biocurators and Biocuration: surveying the 21st century challenges , 2012, Database J. Biol. Databases Curation.
[62] J. Thompson,et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.
[63] Maria Jesus Martin,et al. Minimizing proteome redundancy in the UniProt Knowledgebase , 2016, Database J. Biol. Databases Curation.
[64] Peter B. McGarvey,et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..
[65] Helga Thorvaldsdóttir,et al. Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..
[66] Guy Cochrane,et al. The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..
[67] Zhiyong Lu,et al. On expert curation and scalability: UniProtKB/Swiss-Prot as a case study , 2017, Bioinform..
[68] Robert C. Edgar,et al. MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.
[69] A Bairoch,et al. Go hunting in sequence databases but watch out for the traps. , 1996, Trends in genetics : TIG.
[70] Qingyu Chen,et al. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases , 2016, PLoS ONE.
[71] Peter D. Karp,et al. How much does curation cost? , 2016, Database J. Biol. Databases Curation.
[72] Walter R. Gilks,et al. Modeling the percolation of annotation errors in a database of protein sequences , 2002, Bioinform..
[73] Jean-Claude Tardif,et al. Damming the genomic data flood using a comprehensive analysis and storage data structure , 2010, Database J. Biol. Databases Curation.
[74] Barbara Wixom,et al. An Empirical Investigation of the Factors Affecting Data Warehousing Success , 2001, MIS Q..
[75] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.
[76] Yifan Peng,et al. Improving chemical disease relation extraction with rich features and weakly labeled data , 2016, Journal of Cheminformatics.
[77] Patrick Ruch,et al. Application of text-mining for updating protein post-translational modification annotation in UniProtKB , 2012, BMC Bioinformatics.
[78] B. Langmead,et al. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive , 2016, Genome Biology.
[79] Cathy H. Wu,et al. Protein Bioinformatics From Protein Modifications and Networks to Proteomics , 2017 .
[80] Gregory D. Schuler,et al. Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.
[81] Haipeng Liu,et al. MoonProt: a database for proteins that are known to moonlight , 2013, Nucleic Acids Res..
[82] Min Song,et al. Detecting duplicate biological entities using Markov random field-based edit distance , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.
[83] Robert Petryszak,et al. ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..
[84] Guy Cochrane,et al. The International Nucleotide Sequence Database Collaboration , 2012, Nucleic Acids Res..
[85] Carlo Batini,et al. Data and Information Quality , 2016, Data-Centric Systems and Applications.
[86] Donald P. Ballou,et al. Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems , 1985 .
[87] Wenfei Fan,et al. Data Quality: From Theory to Practice , 2015, SGMD.
[88] Maria Jesus Martin,et al. From data repositories to submission portals: rethinking the role of domain-specific databases in CollecTF , 2016, Database J. Biol. Databases Curation.
[89] Alex Bateman,et al. Curators of the world unite: the International Society of Biocuration , 2010, Bioinform..
[90] J. Michael Cherry,et al. Prevention of data duplication for high throughput sequencing repositories , 2018, Database J. Biol. Databases Curation.
[91] Ricardo Villamarín-Salomón,et al. ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..
[92] Qingyu Chen,et al. Benchmarks for Measurement of Duplicate Detection Methods in Nucleotide Databases , 2016, bioRxiv.
[93] Michele Magrane,et al. Searching and Navigating UniProt Databases , 2015, Current protocols in bioinformatics.
[94] Paolo Papotti,et al. Big data quality - whose problem is it? , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).
[95] Friedhelm Pfeiffer,et al. A Manual Curation Strategy to Improve Genome Annotation: Application to a Set of Haloarchael Genomes , 2015, Life.
[96] Cathy H. Wu,et al. Activities at the Universal Protein Resource (UniProt) , 2014, Nucleic Acids Research.
[97] J. Mesirov,et al. The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .
[98] A. Baxevanis. The Importance of Biological Databases in Biological Discovery , 2003, Current protocols in bioinformatics.
[99] Felix Naumann,et al. Data Quality in Genome Databases , 2003, ICIQ.
[100] Robert D. Finn,et al. Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..
[101] Winston A Hide,et al. Big data: The future of biocuration , 2008, Nature.
[102] Christian Cole,et al. The Jpred 3 secondary structure prediction server , 2008, Nucleic Acids Res..
[103] Sean R. Davis,et al. NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..
[104] The Uniprot Consortium,et al. UniProt: a hub for protein information , 2014, Nucleic Acids Res..
[105] Midori A. Harris,et al. Canto: an online tool for community literature curation , 2014, Bioinform..
[106] Rolf Apweiler,et al. A novel method for automatic functional annotation of proteins , 1999, Bioinform..
[107] Zhiyong Lu,et al. On expert curation and sustainability: UniProtKB/Swiss-Prot as a case study , 2016, bioRxiv.
[108] Anne Niknejad,et al. Uncovering hidden duplicated content in public transcriptomics data , 2013, Database J. Biol. Databases Curation.
[109] Zhiyong Lu,et al. PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..
[110] S. Guptill,et al. Elements of Spatial Data Quality , 1995 .
[111] S. Poux. On expert curation and sustainability: UniProtKB/Swiss-Prot as a case study , 2017 .
[112] Guy Cochrane,et al. European Nucleotide Archive in 2016 , 2016, Nucleic Acids Res..