APPLES: Fast Distance-Based Phylogenetic Placement

Significance Accurate description of protein structure and function is a fundamental step toward understanding biological life and highly relevant in the development of therapeutics. Although greatly improved, experimental protein structure determination is still low-throughput and costly, especially for membrane proteins. As such, computational structure prediction is often resorted. Predicting the structure of a protein without similar structures in the Protein Data Bank is very challenging and usually needs a large amount of computing power. This paper shows that by using a powerful deep learning technique, even with only a personal computer we can predict new folds much more accurately than ever before. This method also works well on membrane protein folding. Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.

[1]  T. Misteli,et al.  Extensive Heterogeneity and Intrinsic Variation in Spatial Genome Organization , 2019, Cell.

[2]  Stefan Mundlos,et al.  Identifying cis Elements for Spatiotemporal Control of Mammalian DNA Replication , 2019, Cell.

[3]  Beryl B. Cummings,et al.  A quantitative framework for characterizing the evolutionary history of mammalian gene expression , 2018, Genome research.

[4]  Katherine S. Pollard,et al.  Chromatin features constrain structural variation across evolutionary timescales , 2018, Proceedings of the National Academy of Sciences.

[5]  Ting Wang,et al.  Co-opted transposons help perpetuate conserved higher-order chromosomal structures , 2018, Genome Biology.

[6]  Elie N. Farah,et al.  3D Chromatin Architecture Remodeling during Human Cardiomyocyte Differentiation Reveals A Role Of HERV-H In Demarcating Chromatin Domains , 2018 .

[7]  V. Corces,et al.  Organizational principles of 3D genome architecture , 2018, Nature Reviews Genetics.

[8]  R. O’Neill,et al.  Epigenetic maintenance of topological domains in the highly rearranged gibbon genome , 2018, Genome research.

[9]  Quanquan Gu,et al.  Continuous-trait probabilistic model for comparing multi-species functional genomic data , 2018, bioRxiv.

[10]  Jian Ma,et al.  Predicting CTCF-mediated chromatin loops using CTCF-MP , 2018, bioRxiv.

[11]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[12]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[13]  David Haussler,et al.  The UCSC Genome Browser database: 2018 update , 2017, Nucleic Acids Res..

[14]  R. Green,et al.  The genomic false shuffle: epigenetic maintenance of topological domains in the rearranged gibbon genome , 2017, bioRxiv.

[15]  Giacomo Cavalli,et al.  Organization and function of the 3D genome , 2016, Nature Reviews Genetics.

[16]  Neva C. Durand,et al.  Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. , 2016, Cell systems.

[17]  Dariusz M Plewczynski,et al.  CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription , 2015, Cell.

[18]  Stein Aerts,et al.  Identification of Lineage-Specific Cis-Regulatory Modules Associated with Variation in Transcription Factor Binding and Chromatin Activity Using Ornstein–Uhlenbeck Models , 2015, Molecular biology and evolution.

[19]  D. Odom,et al.  Comparative Hi-C Reveals that CTCF Underlies Evolution of Chromosomal Domain Architecture , 2015, Cell reports.

[20]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[21]  Yanli Wang,et al.  Topologically associating domains are stable units of replication-timing regulation , 2014, Nature.

[22]  Emmanuelle Gouillart,et al.  scikit-image: image processing in Python , 2014, PeerJ.

[23]  Rasmus Nielsen,et al.  Modeling gene expression evolution with an extended Ornstein-Uhlenbeck process accounting for within-species variation. , 2014, Molecular biology and evolution.

[24]  N. Rhind,et al.  DNA replication timing. , 2013, Cold Spring Harbor perspectives in biology.

[25]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Robert P. Freckleton,et al.  Fast likelihood calculations for comparative analyses , 2012 .

[27]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[28]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[29]  J. Sedat,et al.  Spatial partitioning of the regulatory landscape of the X-inactivation centre , 2012, Nature.

[30]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[31]  S. Bergmann,et al.  The evolution of gene expression levels in mammalian organs , 2011, Nature.

[32]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[33]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[34]  Michael Werman,et al.  The Quadratic-Chi Histogram Distance Family , 2010, ECCV.

[35]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[36]  Stuart Geman,et al.  Markov Random Field Image Models and Their Applications to Computer Vision , 2010 .

[37]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[38]  Stefano Soatto,et al.  Quick Shift and Kernel Methods for Mode Seeking , 2008, ECCV.

[39]  T. F. Hansen,et al.  A Comparative Method for Studying Adaptation to a Randomly Evolving Environment , 2008, Evolution; international journal of organic evolution.

[40]  Terrence S. Furey,et al.  The UCSC Genome Browser Database: update 2006 , 2005, Nucleic Acids Res..

[41]  Bernard B. Suh,et al.  Reconstructing contiguous regions of an ancestral genome. , 2006, Genome research.

[42]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[43]  A. King,et al.  Phylogenetic Comparative Analysis: A Modeling Approach for Adaptive Evolution , 2004, The American Naturalist.

[44]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  R. Zabih,et al.  What energy functions can be minimized via graph cuts , 2004 .

[47]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[48]  Gilles Celeux,et al.  EM procedures using mean field-like approximations for Markov model-based image segmentation , 2003, Pattern Recognit..

[49]  T. Hayakawa,et al.  Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Tal Pupko,et al.  A structural EM algorithm for phylogenetic inference , 2001, J. Comput. Biol..

[52]  Stephen M. Smith,et al.  Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm , 2001, IEEE Transactions on Medical Imaging.

[53]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[54]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[55]  M. Pagel Inferring the historical patterns of biological evolution , 1999, Nature.

[56]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[57]  R. Zabih,et al.  Efficient Graph-Based Energy Minimization Methods in Computer Vision , 1999 .

[58]  T. F. Hansen STABILIZING SELECTION AND THE COMPARATIVE ANALYSIS OF ADAPTATION , 1997, Evolution; international journal of organic evolution.

[59]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[60]  J. W. Modestino,et al.  the Mean Field Theory in EM Procedures for Markov Random Fields , 1991, Proceedings. 1991 IEEE International Symposium on Information Theory.

[61]  E. Adelson,et al.  The Plenoptic Function and the Elements of Early Vision , 1991 .

[62]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[64]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[66]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[67]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[68]  R. L. Thorndike Who belongs in the family? , 1953 .

[69]  E. Topol,et al.  The personal and clinical utility of polygenic risk scores , 2018, Nature Reviews Genetics.

[70]  Alexis Battle,et al.  GBAT: a gene-based association method for robust trans-gene regulation detection , 2018, bioRxiv.

[71]  Mary E. Haas,et al.  Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations , 2018, Nature Genetics.

[72]  O. Andreassen,et al.  Polygenic hazard score to guide screening for aggressive prostate cancer: development and validation in large scale cohorts , 2018, British Medical Journal.

[73]  D. Postma,et al.  Missing heritability: is the gap closing? An analysis of 32 complex traits in the Lifelines Cohort Study , 2017, European Journal of Human Genetics.

[74]  Christopher R. Gignoux,et al.  Human demographic history impacts genetic risk prediction across diverse populations , 2016, bioRxiv.

[75]  Karen L. Mohlke,et al.  The Metabolic Syndrome in Men study: a resource for studies of metabolic and cardiovascular diseases , 2017, Journal of Lipid Research.

[76]  Dermot F. Reilly,et al.  Polygenic Risk Score Identifies Subgroup With Higher Burden of Atherosclerosis and Greater Relative Benefit From Statin Therapy in the Primary Prevention Setting , 2017, Circulation.

[77]  Peter Donnelly,et al.  Bayesian analysis of genetic association across tree-structured routine healthcare data in the UK Biobank , 2017, Nature Genetics.

[78]  W. Willett,et al.  Breast Cancer Risk From Modifiable and Nonmodifiable Risk Factors Among White Women in the United States. , 2016, JAMA oncology.

[79]  F. Dudbridge Polygenic Epidemiology , 2016, Genetic epidemiology.

[80]  D. Balding,et al.  Using Genetic Distance to Infer the Accuracy of Genomic Prediction , 2015, PLoS genetics.

[81]  R Plomin,et al.  Phenome-wide analysis of genome-wide polygenic scores , 2015, Molecular Psychiatry.

[82]  T. Lehtimäki,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015, Nature Genetics.

[83]  P. Visscher,et al.  Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores , 2015, bioRxiv.

[84]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[85]  Kaanan P. Shah,et al.  PrediXcan: Trait Mapping Using Human Transcriptome Regulation , 2015, bioRxiv.

[86]  B. Berger,et al.  Two variance component model improves genetic prediction in family data sets , 2015, bioRxiv.

[87]  B. Berger,et al.  Efficient Bayesian mixed model analysis increases association power in large cohorts , 2014, Nature Genetics.

[88]  Guo-Bo Chen,et al.  Estimating heritability of complex traits from genome-wide association studies using IBS-based Haseman–Elston regression , 2014, Front. Genet..

[89]  P. Visscher,et al.  Advantages and pitfalls in the application of mixed-model association methods , 2014, Nature Genetics.

[90]  Genetic Prediction of Quantitative Lipid Traits: Comparing Shrinkage Models to Gene Scores , 2014, Genetic epidemiology.

[91]  S. Thompson,et al.  Use of allele scores as instrumental variables for Mendelian randomization , 2013, International journal of epidemiology.

[92]  N. Patterson,et al.  Using Extended Genealogy to Estimate Components of Heritability for 23 Quantitative and Dichotomous Traits , 2013, PLoS genetics.

[93]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[94]  Oliver Stegle,et al.  A Lasso multi-marker mixed model for association mapping with population structure correction , 2013, Bioinform..

[95]  D. Altshuler,et al.  Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies , 2012, PLoS genetics.

[96]  Sang Hong Lee,et al.  Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood , 2012, Bioinform..

[97]  Tatiana I Axenovich,et al.  Rapid variance components–based method for whole-genome association analysis , 2012, Nature Genetics.

[98]  Peter Kraft,et al.  Analysis of case-control association studies with known risk variants , 2012, Bioinform..

[99]  Eleazar Eskin,et al.  Improved linear mixed models for genome-wide association studies , 2012, Nature Methods.

[100]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[101]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[102]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[103]  David Heckerman,et al.  Correction for hidden confounders in the genetic analysis of gene expression , 2010, Proceedings of the National Academy of Sciences.

[104]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[105]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[106]  G. McVean A Genealogical Interpretation of Principal Components Analysis , 2009, PLoS genetics.

[107]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[108]  Peter M Visscher,et al.  Prediction of individual genetic risk to disease from genome-wide association studies. , 2007, Genome research.

[109]  C. Haley,et al.  GRAMMAR: a fast and simple method for genome-wide pedigree-based quantitative trait loci association analysis , 2007 .

[110]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[111]  D. Balding,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2005, Genetica.

[112]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[113]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[114]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[115]  G. Robinson That BLUP is a Good Thing: The Estimation of Random Effects , 1991 .

[116]  W L Haskell,et al.  Regional adiposity patterns in relation to lipids, lipoprotein cholesterol, and lipoprotein subfraction mass in men. , 1989, The Journal of clinical endocrinology and metabolism.

[117]  H. D. Patterson,et al.  Recovery of inter-block information when block sizes are unequal , 1971 .

[118]  David T. Jones,et al.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features , 2018, Bioinform..

[119]  Sheng Wang,et al.  Protein threading using residue co-variation and deep learning , 2018, Bioinform..

[120]  Kuldip K. Paliwal,et al.  Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks , 2018, Bioinform..

[121]  Qing Wu,et al.  ComplexContact: a web server for inter-protein contact prediction using deep learning , 2018, Nucleic Acids Res..

[122]  Sheng Wang,et al.  RaptorX-Angle: real-value prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning , 2018, BMC Bioinformatics.

[123]  Alessandro Barbato,et al.  Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12 , 2018, Proteins.

[124]  Frank DiMaio,et al.  Protein structure prediction using Rosetta in CASP12 , 2018, Proteins.

[125]  Yang Zhang,et al.  Template‐based and free modeling of I‐TASSER and QUARK pipelines using predicted contact maps in CASP12 , 2018, Proteins.

[126]  Mohammed AlQuraishi,et al.  End-to-end differentiable learning of protein structure , 2018, bioRxiv.

[127]  Bonnie Berger,et al.  Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks , 2017, Cell systems.

[128]  Andriy Kryshtafovych,et al.  Assessment of contact predictions in CASP12: Co‐evolution and deep learning coming of age , 2017, Proteins.

[129]  Jie Hou,et al.  DNCON2: improved protein contact prediction using two-level deep convolutional neural networks , 2017, bioRxiv.

[130]  Sheng Wang,et al.  Analysis of deep learning methods for blind protein contact prediction in CASP12 , 2017, bioRxiv.

[131]  Yizhou Yu,et al.  Folding membrane proteins by deep transfer learning , 2017, bioRxiv.

[132]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[133]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[134]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[135]  Zhiyong Wang,et al.  Knowledge-based machine learning methods for macromolecular 3D structure prediction , 2016, 1609.05061.

[136]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[137]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[138]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[139]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[140]  Janusz M. Bujnicki,et al.  GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function , 2015, Bioinform..

[141]  Jianlin Cheng,et al.  CONFOLD: Residue‐residue contact‐guided ab initio protein folding , 2015, Proteins.

[142]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[143]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[144]  Zhiyong Wang,et al.  Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning , 2013, Bioinform..

[145]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[146]  Carlo Baldassi,et al.  Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners , 2014, PloS one.

[147]  Giuseppe Tradigo,et al.  Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks , 2014, BMC Bioinformatics.

[148]  David Baker,et al.  High-resolution comparative modeling with RosettaCM. , 2013, Structure.

[149]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[150]  A. Valencia,et al.  Emerging methods in protein co-evolution , 2013, Nature Reviews Genetics.

[151]  Jianzhu Ma,et al.  Protein structure alignment beyond spatial proximity , 2013, Scientific Reports.

[152]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[153]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[154]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[155]  Jinbo Xu,et al.  A position-specific distance-dependent statistical potential for protein structure and functional study. , 2012, Structure.

[156]  Jian Peng,et al.  A conditional neural fields model for protein threading , 2012, Bioinform..

[157]  Nicholas P. Schafer,et al.  AWSEM-MD: protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing. , 2012, Journal of Physical Chemistry B.

[158]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[159]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[160]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[161]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[162]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[163]  Jaap Heringa,et al.  Protein secondary structure prediction. , 2010, Methods in molecular biology.

[164]  Andrzej Kloczkowski,et al.  Distance matrix-based approach to protein structure prediction , 2009, Journal of Structural and Functional Genomics.

[165]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[166]  A. Brunger Version 1.2 of the Crystallography and NMR system , 2007, Nature Protocols.

[167]  Michael Nilges,et al.  Modeling errors in NOE data with a log-normal distribution improves the quality of NMR structures. , 2005, Journal of the American Chemical Society.

[168]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[169]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[170]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[171]  Marc A. Martí-Renom,et al.  Tools for comparative protein structure modeling and analysis , 2003, Nucleic Acids Res..

[172]  W. Taylor,et al.  Global fold determination from a small number of distance restraints. , 1995, Journal of molecular biology.

[173]  David R. Kelley,et al.  Sequential regulatory activity prediction across chromosomes with convolutional neural networks. , 2018, Genome research.

[174]  Michael M. Hoffman,et al.  Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome , 2018, Genome Biology.

[175]  Julien Mairal,et al.  Invariance and Stability of Deep Convolutional Representations , 2017, NIPS.

[176]  Benjamin Recht,et al.  Convolutional Kitchen Sinks for Transcription Factor Binding Site Prediction , 2017, 1706.00125.

[177]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[178]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[179]  Yoseph Barash,et al.  Integrative deep models for alternative splicing , 2017, bioRxiv.

[180]  Avanti Shrikumar,et al.  Reverse-complement parameter sharing improves deep learning models for genomics , 2017, bioRxiv.

[181]  Beilun Wang,et al.  Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks , 2016, PSB.

[182]  Anshul Kundaje,et al.  Denoising genome-wide histone ChIP-seq with convolutional neural networks , 2016, bioRxiv.

[183]  Maxime Déraspe,et al.  Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons , 2016, BMC Genomics.

[184]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[185]  Julien Mairal,et al.  End-to-End Kernel Learning with Supervised Convolutional Kernel Networks , 2016, NIPS.

[186]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[187]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[188]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[189]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[190]  Manolis Kellis,et al.  Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments , 2013, Nucleic acids research.

[191]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[192]  Alexander J. Stewart,et al.  Why Transcription Factor Binding Sites Are Ten Nucleotides Long , 2012, Genetics.

[193]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[194]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[195]  Vladimir Pavlovic,et al.  Scalable Algorithms for String Kernels with Inexact Matching , 2008, NIPS.

[196]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[197]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[198]  Klaus Obermayer,et al.  Fast model-based protein homology detection without alignment , 2007, Bioinform..

[199]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[200]  Tony Håndstad,et al.  Motif kernel generated by genetic programming improves remote homology and fold detection , 2007, BMC Bioinformatics.

[201]  George Karypis,et al.  Profile-based direct kernels for remote homology detection and fold recognition , 2005, Bioinform..

[202]  Ke Wang,et al.  Profile-based string kernels for remote homology detection and motif extraction , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[203]  Christina S. Leslie,et al.  Fast String Kernels using Inexact Matching for Protein Sequences , 2004, J. Mach. Learn. Res..

[204]  Tatsuya Akutsu,et al.  Protein homology detection using string alignment kernels , 2004, Bioinform..

[205]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[206]  Li Liao,et al.  Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships , 2003, J. Comput. Biol..

[207]  M. Tompa,et al.  Discovery of novel transcription factor binding sites by statistical overrepresentation. , 2002, Nucleic acids research.

[208]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[209]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[210]  Jason Weston,et al.  Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[211]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[212]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[213]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[214]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[215]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[216]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[217]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.