Supervised Learning for Detection of Duplicates in Genomic Sequence Databases
暂无分享,去创建一个
Qingyu Chen | Karin M. Verspoor | Xiuzhen Zhang | Justin Zobel | Karin Verspoor | J. Zobel | Xiuzhen Zhang | Qingyu Chen
[1] Karin M. Verspoor,et al. Evaluation of a Machine Learning Duplicate Detection Method for Bioinformatics Databases , 2015, DTMBIO@CIKM.
[2] Rishiraj Saha Roy,et al. Probabilistic Deduplication of Anonymous Web Traffic , 2015, WWW.
[3] J. Fitzgerald,et al. Understanding fraud: the nature of fraud offences recorded by NSW Police , 2015 .
[4] Guillaume J. Filion,et al. Starcode: sequence clustering based on all-pairs search , 2015, Bioinform..
[5] The Uniprot Consortium,et al. UniProt: a hub for protein information , 2014, Nucleic Acids Res..
[6] Claire O'Donovan,et al. Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data , 2014, Database J. Biol. Databases Curation.
[7] Elmer V. Bernstam,et al. A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation , 2014, J. Am. Medical Informatics Assoc..
[8] María Martín,et al. Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..
[9] Min Song,et al. Mapping biological entities using the longest approximately common prefix method , 2014, BMC Bioinformatics.
[10] Elmer V. Bernstam,et al. Optimized Dual Threshold Entity Resolution For Electronic Health Record Databases - Training Set Size And Active Learning , 2013, AMIA.
[11] Riccardo Percudani,et al. Ureidoglycolate hydrolase, amidohydrolase, lyase: how errors in biological databases are incorporated in scientific papers and vice versa , 2013, Database J. Biol. Databases Curation.
[12] Valentin Guignon,et al. The Banana Genome Hub , 2013, Database J. Biol. Databases Curation.
[13] Yoshihiko Suhara,et al. Automatically generated spam detection based on sentence-level topic information , 2013, WWW '13 Companion.
[14] Liang Feng,et al. Practical Duplicate Bug Reports Detection in a Large Web-Based Development Community , 2013, APWeb.
[15] Shie-Jue Lee,et al. Detecting near-duplicate documents using sentence-level features and supervised learning , 2013, Expert Syst. Appl..
[16] Daniel W. A. Buchan,et al. A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.
[17] Zhengwei Zhu,et al. CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..
[18] Andreas Thor,et al. Tailoring entity resolution for matching product offers , 2012, EDBT '12.
[19] Bruno Martins. A Supervised Machine Learning Approach for Duplicate Detection over Gazetteer Records , 2011, GeoS.
[20] Peter B. McGarvey,et al. A comprehensive protein-centric ID mapping service for molecular data integration , 2011, Bioinform..
[21] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[22] Min Song,et al. Detecting duplicate biological entities using Shortest Path Edit Distance , 2010, Int. J. Data Min. Bioinform..
[23] Ning Ma,et al. BLAST+: architecture and applications , 2009, BMC Bioinformatics.
[24] Patricia C. Babbitt,et al. Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..
[25] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.
[26] Min Song,et al. Detecting duplicate biological entities using Markov random field-based edit distance , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.
[27] Chih-Jen Lin,et al. A Practical Guide to Support Vector Classication , 2008 .
[28] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.
[29] Peter Christen,et al. Quality and Complexity Measures for Data Linkage and Deduplication , 2007, Quality Measures in Data Mining.
[30] Adam Godzik,et al. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..
[31] Paul T. J. Tan,et al. Duplicate Detection in Biological Data using Association Rule Mining , 2004 .
[32] Raymond J. Mooney,et al. Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.
[33] Rajeev Motwani,et al. Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.
[34] JapkowiczNathalie,et al. The class imbalance problem: A systematic study , 2002 .
[35] Nathalie Japkowicz,et al. The class imbalance problem: A systematic study , 2002, Intell. Data Anal..
[36] Chris Sander,et al. Removing near-neighbour redundancy from large protein sequence collections , 1998, Bioinform..
[37] Temple F. Smith,et al. The challenges of genome sequence annotation or “The devil is in the details” , 1997, Nature Biotechnology.
[38] S. Brunak,et al. Cleaning the GenBank Arabidopsis thaliana data set. , 1996, Nucleic acids research.