Data quality-aware genomic data integration
暂无分享,去创建一个
[1] Marco Masseroli,et al. Processing of big heterogeneous genomic datasets for tertiary analysis of Next Generation Sequencing data , 2018, Bioinform..
[2] Lorena Etcheverry,et al. Data Quality Metrics for Genome Wide Association Studies , 2010, 2010 Workshops on Database and Expert Systems Applications.
[3] Steven G. Johnson,et al. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data , 2016, EGEMS.
[4] Zhiyong Lu,et al. On expert curation and scalability: UniProtKB/Swiss-Prot as a case study , 2017, Bioinform..
[5] Ana León,et al. Data Quality Problems When Integrating Genomic Information , 2016, ER Workshops.
[6] Sanjay Ranka,et al. BioDQ: Data Quality Estimation and Management for Genomics Databases , 2008, ISBRA.
[7] Brendan W. Vaughan,et al. The 1000 Genomes Project: data management and community access , 2012, Nature Methods.
[8] Shicai Wang,et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer , 2018, Nucleic Acids Res..
[9] Dennis A. Benson,et al. GenBank , 2018, Nucleic Acids Res..
[10] Massimiliano Izzo,et al. FAIRsharing as a community approach to standards, repositories and policies , 2019, Nature Biotechnology.
[11] Avi Ma'ayan,et al. Mining data and metadata from the gene expression omnibus , 2018, Biophysical Reviews.
[12] Qingyu Chen,et al. Benchmarks for Measurement of Duplicate Detection Methods in Nucleotide Databases , 2016, bioRxiv.
[13] Ellen T. Gelfand,et al. The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.
[14] Qingyu Chen,et al. Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases , 2019, bioRxiv.
[15] Ramkiran Gouripeddi,et al. Towards a content agnostic computable knowledge repository for data quality assessment , 2019, Comput. Methods Programs Biomed..
[16] Diego Marcheggiani,et al. On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports , 2017, JDIQ.
[17] Yike Guo,et al. Consistency, comprehensiveness, and compatibility of pathway databases , 2010, BMC Bioinformatics.
[18] Thomas Redman,et al. Data quality for the information age , 1996 .
[19] Oscar Pastor,et al. A Method to Identify Relevant Genome Data: Conceptual Modeling for the Medicine of Precision , 2018, ER.
[20] Roy Pardee,et al. The HMO Research Network Virtual Data Warehouse: A Public Data Model to Support Collaboration , 2014, EGEMS.
[21] Carole A. Goble,et al. Data curation + process curation=data integration + science , 2008, Briefings Bioinform..
[22] Fabian Prasser,et al. Improving Data Quality in Medical Research: A Monitoring Architecture for Clinical and Translational Data Warehouses , 2020, 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS).
[23] Michael Q. Zhang,et al. Integrative analysis of 111 reference human epigenomes , 2015, Nature.
[24] Diane M. Strong,et al. Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..
[25] Kei-Hoi Cheung,et al. CEDAR: Semantic Web Technology to Support Open Science , 2018, WWW.
[26] Nan Deng,et al. Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis , 2018, Database.
[27] Mark A. Musen,et al. The Open Biomedical Annotator , 2009, Summit on translational bioinformatics.
[28] Richard S. Sandstrom,et al. BEDOPS: high-performance genomic feature operations , 2012, Bioinform..
[29] Anna Zhukova,et al. Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..
[30] O Bodenreider,et al. Biomedical ontologies in action: role in knowledge management, data integration and decision support. , 2008, Yearbook of medical informatics.
[31] Samuel T. Savitz,et al. How much can we trust electronic health record data? , 2020, Healthcare.
[32] Marco Masseroli,et al. GenoSurf: metadata driven semantic search system for integrated genomic datasets , 2019, Database J. Biol. Databases Curation.
[33] J. Michael Cherry,et al. Prevention of data duplication for high throughput sequencing repositories , 2018, Database J. Biol. Databases Curation.
[34] Astrid Gall,et al. Ensembl 2018 , 2017, Nucleic Acids Res..
[35] Marco Masseroli,et al. META-BASE: A Novel Architecture for Large-Scale Genomic Metadata Integration , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[36] K. Sanderson. Bioinformatics: Curation generation. , 2011, Nature.
[37] Susan B. Davidson,et al. BioGuideSRS: querying multiple sources with a user-centric perspective , 2007, Bioinform..
[38] G. Lin,et al. A comparison framework and guideline of clustering methods for mass cytometry data , 2019, Genome Biology.
[39] Tatiana A. Tatusova,et al. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata , 2011, Nucleic Acids Res..
[40] Mark Gerstein,et al. GENCODE reference annotation for the human and mouse genomes , 2018, Nucleic Acids Res..
[41] Hedi Peterson,et al. The bio.tools registry of software tools and data resources for the life sciences , 2019, Genome Biology.
[42] Bartek Wilczynski,et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..
[43] José Fabián Reyes Román,et al. Using conceptual modeling to improve genome data management , 2020, Briefings Bioinform..
[44] Joshua M. Stuart,et al. The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.
[45] Syed Haider,et al. International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data , 2011, Database J. Biol. Databases Curation.
[46] Ricardo Cruz-Correia,et al. Personalised medicine challenges: quality of data , 2018, International Journal of Data Science and Analytics.
[47] Olivier Bodenreider,et al. The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..
[48] A Bairoch,et al. SWISS-PROT: connecting biomolecular knowledge via a protein database. , 2001, Current issues in molecular biology.
[49] Sean R. Davis,et al. NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..
[50] Marco Masseroli,et al. OpenGDC: Unifying, Modeling, Integrating Cancer Genomic Data and Clinical Metadata , 2020, Applied Sciences.
[51] Elena Baralis,et al. Data Cleaning and Semantic Improvement in Biological Databases , 2006, J. Integr. Bioinform..
[52] James B Thissen,et al. Manipulation of the Gut Microbiome Alters Acetaminophen Biodisposition in Mice , 2020, Scientific Reports.
[53] Marco Masseroli,et al. TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas , 2017, BMC Bioinformatics.
[54] Helen E. Parkinson,et al. BioSamples database: an updated sample metadata hub , 2018, Nucleic Acids Res..
[55] Stefano Ceri,et al. Ontology-driven metadata enrichment for genomic datasets , 2018, SWAT4LS.
[56] Marco Masseroli,et al. Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying. , 2016, Methods.
[57] Chris Morris,et al. Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data , 2017, bioRxiv.
[58] Fouzia Moussouni,et al. Quality-Aware Integration and Warehousing of Genomic Data , 2005, ICIQ.
[59] S. Schuster. Next-generation sequencing transforms today's biology , 2008, Nature Methods.
[60] Marco Masseroli,et al. Overview of GeCo: A Project for Exploring and Integrating Signals from the Genome , 2017, DAMDID/RCDL.
[61] Data production leads,et al. An integrated encyclopedia of DNA elements in the human genome , 2012 .
[62] Maria Jesus Martin,et al. Minimizing proteome redundancy in the UniProt Knowledgebase , 2016, Database J. Biol. Databases Curation.
[63] Helen E. Parkinson,et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..
[64] Felix Naumann,et al. Data Quality in Genome Databases , 2003, ICIQ.
[65] Stefano Paraboschi,et al. Designing data marts for data warehouses , 2001, TSEM.
[66] Eugenia Galeota,et al. Ontology-driven integrative analysis of omics data through Onassis , 2020, Scientific Reports.
[67] Alan R. Moody,et al. From Big Data to Precision Medicine , 2019, Front. Med..
[68] David Robinson,et al. Research resources: curating the new eagle-i discovery system , 2012, Database J. Biol. Databases Curation.
[69] Claire O'Donovan,et al. Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data , 2014, Database J. Biol. Databases Curation.
[70] Antonio Mauro Saraiva,et al. A conceptual framework for quality assessment and management of biodiversity data , 2017, PloS one.
[71] Stefano Ceri,et al. Exploiting Conceptual Modeling for Searching Genomic Metadata: A Quantitative and Qualitative Empirical Study , 2019, ER Workshops.
[72] Oscar Pastor,et al. Applying Conceptual Modeling to Better Understand the Human Genome , 2016, ER.
[73] Chunhua Weng,et al. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..
[74] S. Lewis,et al. Uberon, an integrative multi-species anatomy ontology , 2012, Genome Biology.
[75] Joachim Hammer,et al. Making quality count in biological data sources , 2005, IQIS '05.
[76] Wenfei Fan,et al. Data Quality: From Theory to Practice , 2015, SGMD.
[77] James C. Hu,et al. The Gene Ontology Resource: 20 years and still GOing strong , 2019 .
[78] Tatiana A. Tatusova,et al. Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..
[79] Allison P. Heath,et al. Toward a Shared Vision for Cancer Genomic Data. , 2016, The New England journal of medicine.
[80] Ulf Leser,et al. Improving data quality by source analysis , 2012, JDIQ.
[81] Cory B. Giles,et al. ALE: automated label extraction from GEO metadata , 2017, BMC Bioinformatics.
[82] Alessandro Campi,et al. Conceptual Modeling for Genomics: Building an Integrated Repository of Open Data , 2017, ER.
[83] Microarray standards at last , 2002, Nature.
[84] Douglas Boyle,et al. Improving a Secondary Use Health Data Warehouse: Proposing a Multi-Level Data Quality Framework , 2019, EGEMS.
[85] Michel Dumontier,et al. MetaCrowd: Crowdsourcing Biomedical Metadata Quality Assessment , 2019, Hum. Comput..
[86] Rasko Leinonen,et al. The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..
[87] Erhard Rahm,et al. Flexible Integration of Molecular-Biological Annotation Data: The GenMapper Approach , 2004, EDBT.
[88] Joshua M. Korn,et al. Next-generation characterization of the Cancer Cell Line Encyclopedia , 2019, Nature.
[89] Alan R. Aronson,et al. An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..
[90] Carole A. Goble,et al. Bioschemas: From Potato Salad to Protein Annotation , 2017, SEMWEB.
[91] Steve Pettifer,et al. EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..
[92] Paul T. J. Tan,et al. Duplicate Detection in Biological Data using Association Rule Mining , 2004 .
[93] Michel Dumontier,et al. Predicting structured metadata from unstructured metadata , 2016, Database J. Biol. Databases Curation.
[94] Rong Chen,et al. Ontology-driven indexing of public datasets for translational bioinformatics , 2009, BMC Bioinformatics.
[95] Xiaoyan Zhang,et al. Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis , 2018, Nucleic Acids Res..
[96] Aaron R. Quinlan,et al. BIOINFORMATICS APPLICATIONS NOTE , 2022 .
[97] Jennifer Widom,et al. Tracing the lineage of view data in a warehousing environment , 2000, TODS.
[98] Anila Sahar Butt,et al. Where to search top-K biomedical ontologies? , 2018, Briefings Bioinform..
[99] F. Arnaud,et al. From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .
[100] Carlo Batini,et al. Data and Information Quality , 2016, Data-Centric Systems and Applications.
[101] Raphael Gottardo,et al. Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.
[102] Mark A. Musen,et al. The variable quality of metadata about biological samples used in biomedical experiments , 2018, Scientific Data.
[103] Gilberto Fragoso,et al. The NCI Thesaurus quality assurance life cycle , 2009, J. Biomed. Informatics.
[104] Stefano Ceri,et al. From a Conceptual Model to a Knowledge Graph for Genomic Datasets , 2019, ER.
[105] Hanlee P. Ji,et al. Data quality in genomics and microarrays , 2006, Nature Biotechnology.
[106] Nuno A. Fonseca,et al. ArrayExpress update – from bulk to single-cell expression data , 2018, Nucleic Acids Res..
[107] Zhiyong Lu,et al. Community challenges in biomedical text mining over 10 years: success, failure and the future , 2016, Briefings Bioinform..
[108] Ulf Leser,et al. Integrating and Warehousing Liver Gene Expression Data and Related Biomedical Resources in GEDAW , 2005, DILS.
[109] Julien Grosjean,et al. Health multi-terminology portal: a semantic added-value for patient safety. , 2011, Studies in health technology and informatics.
[110] S. Samarajiwa,et al. Challenges and Cases of Genomic Data Integration Across Technologies and Biological Scales , 2018 .
[111] Alexander D. Diehl,et al. Logical Development of the Cell Ontology , 2011, BMC Bioinformatics.
[112] Karin M. Verspoor,et al. Comparative Analysis of Sequence Clustering Methods for Deduplication of Biological Databases , 2018, ACM J. Data Inf. Qual..
[113] Alun D. Preece,et al. Quality views: capturing and exploiting the user perspective on data quality , 2006, VLDB.
[114] Rodrigo Lopez,et al. The EBI search engine: EBI search as a service—making biological data accessible for all , 2017, Nucleic Acids Res..
[115] M. Schatz,et al. Big Data: Astronomical or Genomical? , 2015, PLoS biology.
[116] Wen J. Li,et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..
[117] Stuart E. Madnick,et al. Editors’ Comments: ACM Journal of Data and Information Quality (JDIQ) is alive and well! , 2010, JDIQ.
[118] Les Gasser,et al. A framework for information quality assessment , 2007, J. Assoc. Inf. Sci. Technol..
[119] Fouzia Moussouni,et al. QDex: A Database Profiler for Generic Bio-data Exploration and Quality Aware Integration , 2007, WISE Workshops.
[120] Karin M. Verspoor,et al. Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study , 2016, bioRxiv.
[121] Patrick B. Ryan,et al. A Comparison of Data Quality Assessment Checks in Six Data Sharing Networks , 2017, EGEMS.
[122] Laure Berti-Équille,et al. Cleaning, Integrating, and Warehousing Genomic Data From Biomedical Resources , 2013 .
[123] Karin M. Verspoor,et al. Automated detection of records in biological sequence databases that are inconsistent with the literature , 2017, J. Biomed. Informatics.
[124] Martin J. O'Connor,et al. Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases , 2019, Database J. Biol. Databases Curation.
[125] Karin M. Verspoor,et al. Literature consistency of bioinformatics sequence databases is effective for assessing record quality , 2017, bioRxiv.
[126] Eugenia Galeota,et al. Ontology-based annotations and semantic relations in large-scale (epi)genomics data , 2016, Briefings Bioinform..
[127] Elena Baralis,et al. Extraction of Constraints from Biological Data , 2009, Biomedical Data and Applications.
[128] Marco Masseroli,et al. The road towards data integration in human genomics: players, steps and interactions , 2020, Briefings Bioinform..