Machine learning and genome annotation: a match meant to be?

[1]  L. Breiman Random Forests , 2001, Machine Learning.

[2]  Sugato Basu,et al.  Semi-Supervised Learning , 2019, Encyclopedia of Database Systems.

[3]  G. Natoli,et al.  Non-coding transcription at cis-regulatory elements: computational and experimental approaches. , 2013, Methods.

[4]  Kevin Y. Yip,et al.  Machine learning and genome annotation: a match meant to be? , 2013, Genome Biology.

[5]  Atina G. Coté,et al.  Evaluation of methods for modeling transcription factor sequence specificity , 2013, Nature Biotechnology.

[6]  William Stafford Noble,et al.  Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors , 2012, Genome research.

[7]  Kevin Y. Yip,et al.  Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors , 2012, Genome Biology.

[8]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[9]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[10]  D. Bartel,et al.  Long noncoding RNAs in C. elegans , 2012, Genome research.

[11]  Graziano Pesole,et al.  Motif discovery and transcription factor binding sites before and after the next-generation sequencing era , 2012, Briefings Bioinform..

[12]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[13]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[14]  Michael Fernández,et al.  Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines , 2012, Nucleic acids research.

[15]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[16]  Kevin Y Yip,et al.  Genome-wide analysis of chromatin features identifies histone modification sensitive and insensitive yeast transcription factors , 2011, Genome Biology.

[17]  Yungki Park,et al.  Revisiting the negative example sampling problem for predicting protein-protein interactions , 2011, Bioinform..

[18]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[19]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[20]  Rolf Backofen,et al.  Computational discovery of human coding and non-coding transcripts with conserved splice sites , 2011, Bioinform..

[21]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[22]  Raymond K. Auerbach,et al.  Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data. , 2011, Genome research.

[23]  Raymond K. Auerbach,et al.  Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project , 2010, Science.

[24]  Sarah A. Teichmann,et al.  Assessing Computational Methods of Cis-Regulatory Module Prediction , 2010, PLoS Comput. Biol..

[25]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[26]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[27]  Brendan J. Frey,et al.  Deciphering the splicing code , 2010, Nature.

[28]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[29]  B. Ren,et al.  Genome-wide prediction of transcription factor binding sites using an integrated model , 2010, Genome Biology.

[30]  Cheng Soon Ong,et al.  mGene: accurate SVM-based gene finding with an application to nematode genomes. , 2009, Genome research.

[31]  Tim R. Mercer,et al.  Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities , 2008, PLoS Comput. Biol..

[32]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[33]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[34]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[35]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[36]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[37]  Jonathan Livny,et al.  Identification of small RNAs in diverse bacterial species. , 2007, Current opinion in microbiology.

[38]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[39]  Inna Dubchak,et al.  VISTA Enhancer Browser—a database of tissue-specific human enhancers , 2006, Nucleic Acids Res..

[40]  P. D’haeseleer How does DNA sequence motif discovery work? , 2006, Nature Biotechnology.

[41]  Ivan Ovcharenko,et al.  Predicting tissue-specific enhancers in the human genome. , 2006, Genome research.

[42]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[43]  Eugene Berezikov,et al.  Approaches to microRNA discovery , 2006, Nature Genetics.

[44]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[45]  E. Ukkonen,et al.  Genome-wide Prediction of Mammalian Enhancers Based on Analysis of Transcription-Factor Binding Affinity , 2006, Cell.

[46]  Michael R Brent,et al.  Genome annotation past, present, and future: how to define an ORF at each locus. , 2005, Genome research.

[47]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[48]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[49]  Francesca Chiaromonte,et al.  Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. , 2005, Genome research.

[50]  E. Davidson,et al.  Gene regulatory networks for development. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[52]  M. Gerstein,et al.  Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. , 2004, Current opinion in microbiology.

[53]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[54]  Simon Parsons,et al.  Bioinformatics: The Machine Learning Approach by P. Baldi and S. Brunak, 2nd edn, MIT Press, 452 pp., $60.00, ISBN 0-262-02506-X , 2004, The Knowledge Engineering Review.

[55]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[56]  I. Jolliffe Principal Component Analysis , 2005 .

[57]  Michael Q. Zhang Computational prediction of eukaryotic protein-coding genes , 2002, Nature Reviews Genetics.

[58]  A. Telser Molecular Biology of the Cell, 4th Edition , 2002 .

[59]  T. Hubbard,et al.  Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[60]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[61]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[62]  R. Guigó,et al.  An assessment of gene prediction accuracy in large DNA sequences. , 2000, Genome research.

[63]  G. Stormo Gene-finding approaches for eukaryotes. , 2000, Genome research.

[64]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[65]  D. S. Fields,et al.  Specificity, free energy and information content in protein-DNA interactions. , 1998, Trends in biochemical sciences.

[66]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[67]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[68]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[69]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[70]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[71]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[72]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[73]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[74]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[75]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[76]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[77]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[78]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[79]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[80]  Melissa C. Greven,et al.  An integrated encyclopedia of DNA elements in the human genome , 2014 .

[81]  Hsuan-Tien Lin,et al.  Learning From Data , 2012 .

[82]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[83]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[84]  M. Brent Steady progress and recent breakthroughs in the accuracy of automated genome annotation , 2008, Nature Reviews Genetics.

[85]  James Bennett,et al.  The Netflix Prize , 2007 .

[86]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[87]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[88]  Thomas G. Dietterich,et al.  Bioinformatics The Machine Learning Approach 2nd ed. , 2001 .

[89]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[90]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[91]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[92]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[93]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[94]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[95]  J. Mattick,et al.  Genome research , 1990, Nature.

[96]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[97]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[98]  Mark Gerstein,et al.  Bioinformatics Original Paper a Supervised Hidden Markov Model Framework for Efficiently Segmenting Tiling Array Data in Transcriptional and Chip-chip Experiments: Systematically Incorporating Validated Biological Knowledge , 2022 .

[99]  Isaac Bentwich Prediction and validation of microRNAs and their targets , 2005, FEBS letters.

[100]  Christopher D. Brown,et al.  Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE , 2010, Science.