论文信息 - Machine learning and genome annotation: a match meant to be? - 字舞流文

Machine learning and genome annotation: a match meant to be?

Kevin Y. Yip | M. Gerstein | Chao Cheng | Kevin Y. Yip

[1] L. Breiman. Random Forests , 2001, Machine Learning.

[2] Sugato Basu,et al. Semi-Supervised Learning , 2019, Encyclopedia of Database Systems.

[3] G. Natoli,et al. Non-coding transcription at cis-regulatory elements: computational and experimental approaches. , 2013, Methods.

[4] Kevin Y. Yip,et al. Machine learning and genome annotation: a match meant to be? , 2013, Genome Biology.

[5] Atina G. Coté,et al. Evaluation of methods for modeling transcription factor sequence specificity , 2013, Nature Biotechnology.

[6] William Stafford Noble,et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors , 2012, Genome research.

[7] Kevin Y. Yip,et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors , 2012, Genome Biology.

[8] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[9] Raymond K. Auerbach,et al. An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[10] D. Bartel,et al. Long noncoding RNAs in C. elegans , 2012, Genome research.

[11] Graziano Pesole,et al. Motif discovery and transcription factor binding sites before and after the next-generation sequencing era , 2012, Briefings Bioinform..

[12] M. Yandell,et al. A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[13] William Stafford Noble,et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[14] Michael Fernández,et al. Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines , 2012, Nucleic acids research.

[15] Manolis Kellis,et al. ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[16] Kevin Y Yip,et al. Genome-wide analysis of chromatin features identifies histone modification sensitive and insensitive yeast transcription factors , 2011, Genome Biology.

[17] Yungki Park,et al. Revisiting the negative example sampling problem for predicting protein-protein interactions , 2011, Bioinform..

[18] Zhong Wang,et al. Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[19] Michael A. Beer,et al. Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[20] Rolf Backofen,et al. Computational discovery of human coding and non-coding transcripts with conserved splice sites , 2011, Bioinform..

[21] Cole Trapnell,et al. Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[22] Raymond K. Auerbach,et al. Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data. , 2011, Genome research.

[23] Raymond K. Auerbach,et al. Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project , 2010, Science.

[24] Sarah A. Teichmann,et al. Assessing Computational Methods of Cis-Regulatory Module Prediction , 2010, PLoS Comput. Biol..

[25] D. Altshuler,et al. A map of human genome variation from population-scale sequencing , 2010, Nature.

[26] T. Mikkelsen,et al. The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[27] Brendan J. Frey,et al. Deciphering the splicing code , 2010, Nature.

[28] Cole Trapnell,et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[29] B. Ren,et al. Genome-wide prediction of transcription factor binding sites using an integrated model , 2010, Genome Biology.

[30] Cheng Soon Ong,et al. mGene: accurate SVM-based gene finding with an application to nematode genomes. , 2009, Genome research.

[31] Tim R. Mercer,et al. Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities , 2008, PLoS Comput. Biol..

[32] B. Williams,et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[33] M. Gerstein,et al. The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[34] William Stafford Noble,et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[35] Allen D. Delaney,et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[36] A. Mortazavi,et al. Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[37] Jonathan Livny,et al. Identification of small RNAs in diverse bacterial species. , 2007, Current opinion in microbiology.

[38] A. Philippakis,et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[39] Inna Dubchak,et al. VISTA Enhancer Browser—a database of tissue-specific human enhancers , 2006, Nucleic Acids Res..

[40] P. D’haeseleer. How does DNA sequence motif discovery work? , 2006, Nature Biotechnology.

[41] Ivan Ovcharenko,et al. Predicting tissue-specific enhancers in the human genome. , 2006, Genome research.

[42] Wilfred W. Li,et al. MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[43] Eugene Berezikov,et al. Approaches to microRNA discovery , 2006, Nature Genetics.

[44] William Stafford Noble,et al. Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[45] E. Ukkonen,et al. Genome-wide Prediction of Mammalian Enhancers Based on Analysis of Transcription-Factor Binding Affinity , 2006, Cell.

[46] Michael R Brent,et al. Genome annotation past, present, and future: how to define an ORF at each locus. , 2005, Genome research.

[47] Bin Li,et al. Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[48] D. Haussler,et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[49] Francesca Chiaromonte,et al. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. , 2005, Genome research.

[50] E. Davidson,et al. Gene regulatory networks for development. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[51] Paul T. Groth,et al. The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[52] M. Gerstein,et al. Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. , 2004, Current opinion in microbiology.

[53] Jason Weston,et al. Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[54] Simon Parsons,et al. Bioinformatics: The Machine Learning Approach by P. Baldi and S. Brunak, 2nd edn, MIT Press, 452 pp., $60.00, ISBN 0-262-02506-X , 2004, The Knowledge Engineering Review.

[55] Bernhard Schölkopf,et al. Kernel Methods in Computational Biology , 2005 .

[56] I. Jolliffe. Principal Component Analysis , 2005 .

[57] Michael Q. Zhang. Computational prediction of eukaryotic protein-coding genes , 2002, Nature Reviews Genetics.

[58] A. Telser. Molecular Biology of the Cell, 4th Edition , 2002 .

[59] T. Hubbard,et al. Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[60] J. V. Moran,et al. Initial sequencing and analysis of the human genome. , 2001, Nature.

[61] D. Botstein,et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[62] R. Guigó,et al. An assessment of gene prediction accuracy in large DNA sequences. , 2000, Genome research.

[63] G. Stormo. Gene-finding approaches for eukaryotes. , 2000, Genome research.

[64] Sean R. Eddy,et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[65] D. S. Fields,et al. Specificity, free energy and information content in protein-DNA interactions. , 1998, Trends in biochemical sciences.

[66] M. Borodovsky,et al. GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[67] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[68] S. Karlin,et al. Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[69] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[70] David Haussler,et al. A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[71] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[72] D. Haussler,et al. A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[73] Jun S. Liu,et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[74] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[75] E. Uberbacher,et al. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[76] Bernard Widrow,et al. 30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[77] D. Mccormick. Sequence the Human Genome , 1986, Bio/Technology.

[78] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[79] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[80] Melissa C. Greven,et al. An integrated encyclopedia of DNA elements in the human genome , 2014 .

[81] Hsuan-Tien Lin,et al. Learning From Data , 2012 .

[82] K. Pollard,et al. Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[83] Burr Settles,et al. Active Learning Literature Survey , 2009 .

[84] M. Brent. Steady progress and recent breakthroughs in the accuracy of automated genome annotation , 2008, Nature Reviews Genetics.

[85] James Bennett,et al. The Netflix Prize , 2007 .

[86] Teuvo Kohonen,et al. Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[87] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[88] Thomas G. Dietterich,et al. Bioinformatics The Machine Learning Approach 2nd ed. , 2001 .

[89] International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome , 2001, Nature.

[90] John J. Wyrick,et al. Genome-wide location and function of DNA binding proteins. , 2000, Science.

[91] Gregory R. Grant,et al. Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[92] Sean R. Eddy,et al. Profile hidden Markov models , 1998, Bioinform..

[93] Pedro M. Domingos,et al. Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[94] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[95] J. Mattick,et al. Genome research , 1990, Nature.

[96] T. Kohonen. Self-organized formation of topographically correct feature maps , 1982 .

[97] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[98] Mark Gerstein,et al. Bioinformatics Original Paper a Supervised Hidden Markov Model Framework for Efficiently Segmenting Tiling Array Data in Transcriptional and Chip-chip Experiments: Systematically Incorporating Validated Biological Knowledge , 2022 .

[99] Isaac Bentwich. Prediction and validation of microRNAs and their targets , 2005, FEBS letters.

[100] Christopher D. Brown,et al. Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE , 2010, Science.