A Survey of Computational Methods for Protein Function Prediction

Rapid advances in high-throughout genome sequencing technologies have resulted in millions of protein-encoding gene sequences with no functional characterization. Automated protein function annotation or prediction is a prime problem for computational methods to tackle in the post-genomic era of big molecular data. While recent community-driven experiments demonstrate that the accuracy of function prediction methods has significantly improved, challenges remain. The latter are related to the different sources of data exploited to predict function, as well as different choices in representing and integrating heterogeneous data. Current methods predict function from a protein’s sequence, often in the context of evolutionary relationships, from a protein’s three-dimensional structure or specific patterns in the structure, from neighbors in a protein–protein interaction network, from microarray data, or a combination of these different types of data. Here we review these methods and the state of protein function prediction, emphasizing recent algorithmic developments, remaining challenges, and prospects for future research.

[1]  Jingyu Hou,et al.  Predicting protein functions from PPI networks using functional aggregation. , 2012, Mathematical biosciences.

[2]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[3]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[4]  Giorgio Valentini,et al.  Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction , 2010, MLSB.

[5]  Xingli Guo,et al.  A Computational Method Based on the Integration of Heterogeneous Networks for Predicting Disease-Gene Associations , 2011, PloS one.

[6]  Ting Chen,et al.  Assessment of the reliability of protein-protein interactions and protein function prediction , 2002, Pacific Symposium on Biocomputing.

[7]  O. Trelles,et al.  A Computational Strategy for Protein Function Assignment Which Addresses the Multidomain Problem , 2002, Comparative and functional genomics.

[8]  Daisuke Kihara,et al.  The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches , 2015, GigaScience.

[9]  Yu-Dong Cai,et al.  Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition , 2004, Bioinform..

[10]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[11]  Zexuan Zhu,et al.  Whole-Genome Functional Classification of Genes by Latent Semantic Analysis on Microarray Data , 2004, APBC.

[12]  John B. Anderson,et al.  CDD: a Conserved Domain Database for protein classification , 2004, Nucleic Acids Res..

[13]  Hristo Djidjev,et al.  Exact Protein Structure Classification Using the Maximum Contact Map Overlap Metric , 2014, AlCoB.

[14]  Christine A. Orengo,et al.  FFPred: an integrated feature-based function prediction server for vertebrate proteomes , 2008, Nucleic Acids Res..

[15]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[16]  L. Patthy,et al.  Modules, multidomain proteins and organismic complexity , 2005, The FEBS journal.

[17]  Rong Jin,et al.  Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition , 2010, NIPS.

[18]  Jérôme Gouzy,et al.  ProDom: Automated Clustering of Homologous Domains , 2002, Briefings Bioinform..

[19]  Edward M. Marcotte,et al.  Protein function prediction using the Protein Link EXplorer (PLEX) , 2005, Bioinform..

[20]  Patricia De la Vega,et al.  Discovery of Gene Function by Expression Profiling of the Malaria Parasite Life Cycle , 2003, Science.

[21]  Nataša Pržulj,et al.  Methods for biological data integration: perspectives and challenges , 2015, Journal of The Royal Society Interface.

[22]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[23]  Yi Huang,et al.  Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm , 2012, BMC Bioinformatics.

[24]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[25]  Boaz Lerner,et al.  Structure‐based identification of catalytic residues , 2011, Proteins.

[26]  Zhiwen Yu,et al.  Protein Function Prediction Using Multilabel Ensemble Classification , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[28]  M. Riley Systems for categorizing functions of gene products. , 1998, Current Opinion in Structural Biology.

[29]  Volkan Atalay,et al.  GOPred: GO Molecular Function Prediction by Combined Classifiers , 2010, PloS one.

[30]  Liisa Holm,et al.  PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment , 2015, Bioinform..

[31]  Jooyoung Lee,et al.  Improved network community structure improves function prediction , 2013, Scientific Reports.

[32]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[33]  Nir Friedman,et al.  Towards an Integrated Protein-Protein Interaction Network: A Relational Markov Network Approach , 2006, J. Comput. Biol..

[34]  Jean-Philippe Vert A tree kernel to analyze phylog enetic profi les , 2002 .

[35]  Vasant Honavar,et al.  Automated data-driven discovery of motif-based protein function classifiers , 2003, Inf. Sci..

[36]  Yixue Li,et al.  Big Biological Data: Challenges and Opportunities , 2014, Genom. Proteom. Bioinform..

[37]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[38]  Bin Qian,et al.  Detecting distant homologs using phylogenetic tree‐based HMMs , 2003, Proteins.

[39]  Oliviero Carugo,et al.  Rapid Methods for Comparing Protein Structures and Scanning Structure Databases , 2006 .

[40]  Igor V. Tetko,et al.  Super paramagnetic clustering of protein sequences , 2005, BMC Bioinformatics.

[41]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[42]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Amos Bairoch,et al.  Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[44]  George Karypis,et al.  Profile-based direct kernels for remote homology detection and fold recognition , 2005, Bioinform..

[45]  Anna E. Lobley,et al.  Human protein function prediction : application of machine learning for integration of heterogeneous data sources , 2010 .

[46]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[47]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[48]  Lani F. Wu,et al.  Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters , 2002, Nature Genetics.

[49]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[50]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[51]  Li Liao,et al.  Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships , 2003, J. Comput. Biol..

[52]  R. King,et al.  Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining , 2000, Yeast.

[53]  Douglas L. Brutlag,et al.  The EMOTIF database , 2001, Nucleic Acids Res..

[54]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[55]  Oliviero Carugo,et al.  CX, DPX and PRIDE: WWW servers for the analysis and comparison of protein 3D structures , 2005, Nucleic Acids Res..

[56]  Michael I. Jordan,et al.  Consistent probabilistic outputs for protein function prediction , 2008, Genome Biology.

[57]  Amanda Clare,et al.  The utility of different representations of protein sequence for predicting functional class , 2001, Bioinform..

[58]  Xiaoyu Jiang,et al.  Integration of relational and hierarchical network information for protein function prediction , 2008, BMC Bioinformatics.

[59]  H. Erickson,et al.  Co-operativity in protein-protein association. The structure and stability of the actin filament. , 1989, Journal of molecular biology.

[60]  Burkhard Rost,et al.  NLSdb: database of nuclear localization signals , 2003, Nucleic Acids Res..

[61]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[62]  Hiroaki Kitano,et al.  The PANTHER database of protein families, subfamilies, functions and pathways , 2004, Nucleic Acids Res..

[63]  A Keith Dunker,et al.  Order, disorder, and flexibility: prediction from protein sequence. , 2003, Structure.

[64]  Daniel W. A. Buchan,et al.  Protein function prediction by massive integration of evolutionary analyses and multiple data sources , 2013, BMC Bioinformatics.

[65]  Karin M. Verspoor,et al.  Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct , 2015, J. Biomed. Semant..

[66]  Cathy H. Wu,et al.  Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition , 1995, Machine Learning.

[67]  Arjan Durresi,et al.  A survey: Control plane scalability issues and approaches in Software-Defined Networking (SDN) , 2017, Comput. Networks.

[68]  Christine A. Orengo,et al.  Inferring Function Using Patterns of Native Disorder in Proteins , 2007, PLoS Comput. Biol..

[69]  Eugene Agichtein,et al.  Combining Text Mining and Sequence Analysis to Discover Protein Functional Regions , 2003, Pacific Symposium on Biocomputing.

[70]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[71]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[72]  I. Simon,et al.  Studying and modelling dynamic biological processes using time-series gene expression data , 2012, Nature Reviews Genetics.

[73]  Nathan Linial,et al.  ProtoMap: automatic classification of protein sequences and hierarchy of protein families , 2000, Nucleic Acids Res..

[74]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[75]  Eytan Ruppin,et al.  Motif extraction and protein classification , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[76]  Silvio C. E. Tosatto,et al.  Protein function prediction using guilty by association from interaction networks , 2015, Amino Acids.

[77]  Edward W. Lowe,et al.  Computational Methods in Drug Discovery , 2014, Pharmacological Reviews.

[78]  Anne de Jong,et al.  Adaptation of Hansenula polymorpha to methanol: a transcriptome analysis , 2010, BMC Genomics.

[79]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[80]  Hwee Kuan Lee,et al.  Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity , 2010, BMC Bioinformatics.

[81]  Kian-Lee Tan,et al.  Rapid 3D protein structure database searching using information retrieval techniques , 2004, Bioinform..

[82]  P. Røgen,et al.  Automatic classification of protein structure by using Gauss integrals , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Philippe Besse,et al.  Clustering Time-Series Gene Expression Data Using Smoothing Spline Derivatives , 2007, EURASIP J. Bioinform. Syst. Biol..

[84]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[85]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[86]  Zhe Zhang,et al.  Efficient digest of high-throughput sequencing data in a reproducible report , 2013, BMC Bioinformatics.

[87]  George Karypis,et al.  Gene classification using expression profiles: a feasibility study , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[88]  C. J. Zheng,et al.  Prediction of functional class of novel plant proteins by a statistical learning method. , 2005, The New phytologist.

[89]  Zheng-Zhi Wang,et al.  Using Logistic Regression Method to Predict Protein Function from Protein-Protein Interaction Data , 2009, 2009 3rd International Conference on Bioinformatics and Biomedical Engineering.

[90]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[91]  Lingling An,et al.  Dynamic Clustering of Gene Expression , 2012, ISRN bioinformatics.

[92]  Lorenz Wernisch,et al.  Identifying structural domains in proteins. , 2005, Methods of biochemical analysis.

[93]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[94]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[95]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[96]  Jianmin Wu,et al.  PINA v2.0: mining interactome modules , 2011, Nucleic Acids Res..

[97]  Michael W. Berry,et al.  Large-Scale Information Retrieval with Latent Semantic Indexing , 1997, Inf. Sci..

[98]  Jean-Michel Claverie,et al.  Phydbac2: improved inference of gene function using interactive phylogenomic profiling and chromosomal location analysis , 2004, Nucleic Acids Res..

[99]  George Karypis,et al.  Gene Classification Using Expression Profiles: A Feasibility Study , 2005, Int. J. Artif. Intell. Tools.

[100]  W. Pearson,et al.  Sensitivity and selectivity in protein structure comparison , 2004, Protein science : a publication of the Protein Society.

[101]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[102]  C. Chothia,et al.  Structure, function and evolution of multidomain proteins. , 2004, Current opinion in structural biology.

[103]  Jingyu Hou,et al.  Explore the hidden treasure in protein-protein interaction networks - An iterative model for predicting protein functions , 2015, J. Bioinform. Comput. Biol..

[104]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[105]  Michael Y. Galperin,et al.  Evolutionary Concept in Genetics and Genomics , 2003 .

[106]  Søren Brunak,et al.  Prediction of human protein function according to Gene Ontology categories , 2003, Bioinform..

[107]  Michael I. Jordan,et al.  Protein Molecular Function Prediction by Bayesian Phylogenomics , 2005, PLoS Comput. Biol..

[108]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[109]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[110]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[111]  Ichigaku Takigawa,et al.  Annotating gene function by combining expression data with a modular gene network , 2007, ISMB/ECCB.

[112]  G. Kleywegt Use of non-crystallographic symmetry in protein structure refinement. , 1996, Acta crystallographica. Section D, Biological crystallography.

[113]  Xiaohui Liu,et al.  Consensus clustering and functional interpretation of gene-expression data , 2004, Genome Biology.

[114]  Dmitrij Frishman,et al.  The MIPS mammalian protein?Cprotein interaction database , 2005, Bioinform..

[115]  James E. Bray,et al.  The CATH database: an extended protein family resource for structural and functional genomics , 2003, Nucleic Acids Res..

[116]  Kaisheng Chen,et al.  In silico gene function prediction using ontology-based pattern identification , 2005, Bioinform..

[117]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[118]  Claude Pasquier,et al.  PRED‐CLASS: Cascading neural networks for generalized protein classification and genome‐wide applications , 2001, Proteins.

[119]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[120]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[121]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[122]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[123]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[124]  Jingyu Hou,et al.  Progressive Clustering Based Method for Protein Function Prediction , 2013, Bulletin of mathematical biology.

[125]  P. Koehl,et al.  Protein structure similarities. , 2001, Current opinion in structural biology.

[126]  Jean-Michel Claverie,et al.  Phydbac "Gene Function Predictor" : a gene annotation tool based on genomic context analysis , 2005, BMC Bioinformatics.

[127]  Amarda Shehu,et al.  Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space , 2014, BMC Bioinformatics.

[128]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[129]  Rachel Kolodny,et al.  Using Protein Fragments for Searching atabases and Data-Mining Protein D , 2013 .

[130]  Qian Liu,et al.  k-Partite cliques of protein interactions: A novel subgraph topology for functional coherence analysis on PPI networks. , 2014, Journal of theoretical biology.

[131]  Ori Sasson,et al.  The metric space of proteins-comparative study of clustering algorithms , 2002, ISMB.

[132]  H. Mewes,et al.  Functional modules by relating protein interaction networks and gene expression. , 2003, Nucleic acids research.

[133]  Elisa N Ferreira,et al.  Alternative splicing enriched cDNA libraries identify breast cancer-associated transcripts , 2010, BMC Genomics.

[134]  John Moult,et al.  Protein family clustering for structural genomics. , 2005, Journal of molecular biology.

[135]  Abdelghani Bellaachia,et al.  E-CAST: A Data Mining Algorithm for Gene Expression Data , 2002, BIOKDD.

[136]  Michael Y. Galperin,et al.  Sequence ― Evolution ― Function: Computational Approaches in Comparative Genomics , 2010 .

[137]  Jignesh M. Patel,et al.  Michigan molecular interactions r2: from interacting proteins to pathways , 2008, Nucleic Acids Res..

[138]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[139]  Quaid Morris,et al.  Fast integration of heterogeneous data sources for predicting gene function with limited annotation , 2010, Bioinform..

[140]  Nadia Essoussi,et al.  Partitioning clustering algorithms for protein sequence data sets , 2009, BioData Mining.

[141]  S. Pongor,et al.  Protein fold similarity estimated by a probabilistic approach based on Cα-Cα distance comparison , 2002 .

[142]  Franck Picard,et al.  High-quality sequence clustering guided by network topology and multiple alignment likelihood , 2012, Bioinform..

[143]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[144]  Daisuke Kihara,et al.  ESG: Extended Similarity Group method for automated protein function prediction , 2008 .

[145]  Le Cong,et al.  Multiplex Genome Engineering Using CRISPR/Cas Systems , 2013, Science.

[146]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..

[147]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[148]  James J. Chen,et al.  Normalization Methods for Analysis of Microarray Gene-Expression Data , 2003, Journal of biopharmaceutical statistics.

[149]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[150]  N. Alexandrov,et al.  SARFing the PDB. , 1996, Protein engineering.

[151]  Andrea Califano,et al.  Functional classification of proteins by pattern discovery and top-down clustering of primary sequences , 2001, IBM Syst. J..

[152]  Nevan J Krogan,et al.  Systems-level analyses identify extensive coupling among gene expression machines , 2006, Molecular systems biology.

[153]  Patricia C. Babbitt,et al.  Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space , 2013, PLoS Comput. Biol..

[154]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[155]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[156]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[157]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[158]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[159]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[160]  M. Perutz,et al.  Structure of Hæmoglobin: A Three-Dimensional Fourier Synthesis at 5.5-Å. Resolution, Obtained by X-Ray Analysis , 1960, Nature.

[161]  Alexander E. Ivliev,et al.  Drug Target Prediction and Repositioning Using an Integrated Network-Based Approach , 2013, PloS one.

[162]  Judith Klein-Seetharaman,et al.  PROTEINS: Structure, Function, and Bioinformatics 58:955–970 (2005) Protein Classification Based on Text Document Classification Techniques , 2022 .

[163]  Jan Komorowski,et al.  Predicting gene ontology biological process from temporal gene expression patterns. , 2003, Genome research.

[164]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[165]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[166]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[167]  E. Marcotte,et al.  Predicting functional linkages from gene fusions with confidence. , 2002, Applied bioinformatics.

[168]  B. Liu,et al.  Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection , 2012, PloS one.

[169]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[170]  Weiqi Wang,et al.  Gene ontology friendly biclustering of expression profiles , 2004 .

[171]  Rumen Andonov,et al.  CSA: comprehensive comparison of pairwise protein structure alignments , 2012, Nucleic Acids Res..

[172]  Dao-Qing Dai,et al.  A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[173]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[174]  Mike Tyers,et al.  The GRID: The General Repository for Interaction Datasets , 2003, Genome Biology.

[175]  Liviu Badea,et al.  Functional Discrimination of Gene Expression Patterns in Terms of the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[176]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[177]  C. DeLisi,et al.  Genes linked by fusion events are generally of the same functional category: A systematic analysis of 30 microbial genomes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[178]  Douglas L. Brutlag,et al.  Sequence Motifs: Highly Predictive Features of Protein Function , 2006, Feature Extraction.

[179]  Margarita Osadchy,et al.  Maps of protein structure space reveal a fundamental relationship between protein structure and function , 2011, Proceedings of the National Academy of Sciences.

[180]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[181]  Xuequn Shang,et al.  Mining Frequent Dense Subgraphs based on Extending Vertices from Unbalanced PPI Networks , 2009, 2009 3rd International Conference on Bioinformatics and Biomedical Engineering.

[182]  Philip E Bourne,et al.  Structure comparison and alignment. , 2003, Methods of biochemical analysis.

[183]  Padraig Cunningham,et al.  Biclustering of expression data using simulated annealing , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[184]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[185]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[186]  Frank Klawonn,et al.  Clustering of unevenly sampled gene expression time-series data , 2005, Fuzzy Sets Syst..

[187]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[188]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[189]  M. Riley,et al.  MultiFun, a multifunctional classification scheme for Escherichia coli K-12 gene products. , 2000, Microbial & comparative genomics.

[190]  Paul Pavlidis,et al.  The role of indirect connections in gene networks in predicting function , 2011, Bioinform..

[191]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[192]  Julian Lee,et al.  Unbiased global optimization of Lennard-Jones clusters for N < or =201 using the conformational space annealing method. , 2003, Physical review letters.

[193]  Simon Kasif,et al.  Identification of functional links between genes using phylogenetic profiles , 2003, Bioinform..

[194]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[195]  Puteh Saad,et al.  Remote protein homology detection and fold recognition using two-layer support vector machine classifiers , 2011, Comput. Biol. Medicine.

[196]  Geoffrey J. Barton,et al.  PIPs: human protein–protein interaction prediction database , 2008, Nucleic Acids Res..

[197]  Slobodan Vucetic,et al.  MS-kNN: protein function prediction by integrating multiple data sources , 2013, BMC Bioinformatics.

[198]  Hua Li,et al.  PAND: A Distribution to Identify Functional Linkage from Networks with Preferential Attachment Property , 2015, PloS one.

[199]  U. Gether Uncovering molecular mechanisms involved in activation of G protein-coupled receptors. , 2000, Endocrine reviews.

[200]  Mong-Li Lee,et al.  Efficient remote homology detection using local structure , 2003, Bioinform..

[201]  Jian Pei,et al.  Mining coherent gene clusters from gene-sample-time microarray data , 2004, KDD.

[202]  D. Hand,et al.  Bayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[203]  Jooyoung Lee,et al.  New optimization method for conformational energy calculations on polypeptides: Conformational space annealing , 1997, J. Comput. Chem..

[204]  R Nussinov,et al.  Hydrophobic folding units at protein‐protein interfaces: Implications to protein folding and to protein‐protein association , 1997, Protein science : a publication of the Protein Society.

[205]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[206]  Michal Linial,et al.  Functional Consequences in Metabolic Pathways from Phylogenetic Profiles , 2002, WABI.

[207]  Arne Elofsson,et al.  The Use of Phylogenetic Profiles for Gene Predictions , 2002 .

[208]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[209]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[210]  A Bairoch,et al.  Protein annotation: detective work for function prediction. , 1998, Trends in genetics : TIG.

[211]  Daisuke Kihara,et al.  Graphical Models for Protein Function and Structure Prediction , 2013 .

[212]  Robert Clarke,et al.  Identifying protein interaction subnetworks by a bagging Markov random field-based method , 2012, Nucleic acids research.

[213]  Anton J. Enright,et al.  Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions , 2001, Genome Biology.

[214]  Svetlana Kirillova,et al.  Progress in the PRIDE technique for rapidly comparing protein three-dimensional structures , 2008, BMC Research Notes.

[215]  Renzhi Cao,et al.  Three-Level Prediction of Protein Function by Combining Profile-Sequence Search, Profile-Profile Search, and Domain Co-Occurrence Networks , 2013, BMC Bioinformatics.

[216]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[217]  Hans-Werner Mewes,et al.  MPact: the MIPS protein interaction resource on yeast , 2005, Nucleic Acids Res..

[218]  Jean-Michel Claverie,et al.  Annotation of bacterial genomes using improved phylogenomic profiles , 2003, ISMB.

[219]  Thomas Wetter,et al.  Functional classification of proteins using a nearest neighbour algorithm , 2003, Silico Biol..

[220]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[221]  Kimmen Sjölander,et al.  Phylogenomic inference of protein molecular function: advances and challenges , 2004, Bioinform..

[222]  陈奕欣 Ongoing and future developments at the Universal Protein Resource , 2011 .

[223]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[224]  Michael J. E. Sternberg,et al.  CombFunc: predicting protein function using heterogeneous data sources , 2012, Nucleic Acids Res..

[225]  Martin Raff,et al.  Studying Gene Expression and Function , 2002 .

[226]  S. Brenner Errors in genome annotation. , 1999, Trends in genetics : TIG.

[227]  A. M. Lisewski,et al.  Rapid detection of similarity in protein structure and function through contact metric distances , 2006, Nucleic acids research.

[228]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[229]  Janet M Thornton,et al.  Prediction of protein function from structure: insights from methods for the detection of local structural similarities. , 2005, BioTechniques.

[230]  Dimitrios I. Fotiadis,et al.  Motif-Based Protein Sequence Classification Using Neural Networks , 2005, J. Comput. Biol..

[231]  Saso Dzeroski,et al.  Predicting gene function using hierarchical multi-label decision tree ensembles , 2010, BMC Bioinformatics.

[232]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[233]  M. Sternberg,et al.  Protein structure prediction on the Web: a case study using the Phyre server , 2009, Nature Protocols.

[234]  Jean-Philippe Vert,et al.  A tree kernel to analyse phylogenetic profiles , 2002, ISMB.

[235]  Xutao Deng,et al.  A hidden Markov model for gene function prediction from sequential expression data , 2004 .

[236]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[237]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[238]  Li Liao,et al.  Use of Extended Phylogenetic Profiles with E-Values and Support Vector Machines for Protein Family Classification , 2005 .

[239]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[240]  Masoud Rahgozar,et al.  Protein function prediction using neighbor relativity in protein-protein interaction network , 2013, Comput. Biol. Chem..

[241]  C. A. Andersen,et al.  Prediction of human protein function from post-translational modifications and localization features. , 2002, Journal of molecular biology.

[242]  Cajo J. F. ter Braak,et al.  Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes , 2013, Algorithms for Molecular Biology.

[243]  Jan Komorowski,et al.  Classification of Gene Expression Data in an Ontology , 2001, ISMDA.

[244]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[245]  Matthias E. Futschik,et al.  UniHI 7: an enhanced database for retrieval and interactive analysis of human molecular interaction networks , 2013, Nucleic Acids Res..

[246]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[247]  Trupti Joshi,et al.  Quantitative assessment of relationship between sequence similarity and function similarity , 2007, BMC Genomics.

[248]  Cathy H. Wu,et al.  Protein classification artificial neural system , 1992, Protein science : a publication of the Protein Society.

[249]  S. Pongor,et al.  Protein fold similarity estimated by a probabilistic approach based on C(alpha)-C(alpha) distance comparison. , 2002, Journal of molecular biology.

[250]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[251]  Martin Raff,et al.  From RNA to Protein , 2002 .

[252]  Atul J. Butte,et al.  Comparing the Similarity of Time-Series Gene Expression Using Signal Processing Metrics , 2001, J. Biomed. Informatics.

[253]  M. Levitt,et al.  Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core , 1993, Current Biology.

[254]  Mona Singh,et al.  How and when should interactome-derived clusters be used to predict functional modules and protein function? , 2009, Bioinform..

[255]  Sayan Mukherjee,et al.  Classifying Microarray Data Using Support Vector Machines , 2003 .

[256]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[257]  B. Snel,et al.  Predicting gene function by conserved co-expression. , 2003, Trends in genetics : TIG.

[258]  H. Mewes,et al.  SNAPping up functionally related genes based on context information: a colinearity-free approach. , 2001, Journal of molecular biology.

[259]  S. Oliver From DNA sequence to biological function , 1996, Nature.

[260]  J. Chen,et al.  HAPPI: an online database of comprehensive human annotated and predicted protein interactions , 2009, BMC Genomics.

[261]  Susan Lindquist,et al.  Mechanisms of protein-folding diseases at a glance , 2014, Disease Models & Mechanisms.

[262]  Sara Ballouz,et al.  Measuring the wisdom of the crowds in network-based gene function inference , 2015, Bioinform..

[263]  Jun Kong,et al.  MEROPS: the peptidase database. , 2004, Nucleic acids research.

[264]  Ursula Pieper,et al.  SALIGN: a web server for alignment of multiple protein sequences and structures , 2012, Bioinform..

[265]  Douglas L. Brutlag,et al.  Remote homology detection: a motif based approach , 2003, ISMB.

[266]  Jianhua Xuan,et al.  BMRF-Net: a software tool for identification of protein interaction subnetworks by a bagging Markov random field-based method , 2015, Bioinform..

[267]  Tamás Nepusz,et al.  SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale , 2010, BMC Bioinformatics.

[268]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[269]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[270]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[271]  Saikat Chakrabarti,et al.  SMoS: a database of structural motifs of protein superfamilies. , 2003, Protein engineering.

[272]  Yiannis Kourmpetis,et al.  Bayesian Markov Random Field Analysis for Protein Function Prediction Based on Network Data , 2010, PloS one.

[273]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[274]  B. Frey,et al.  The functional landscape of mouse gene expression , 2004, Journal of biology.

[275]  C. S. Möller-Leveta,et al.  Clustering of unevenly sampled gene expression time-series data , 2005 .

[276]  Aidong Zhang,et al.  Predicting Protein Function by Frequent Functional Association Pattern Mining in Protein Interaction Networks , 2010, IEEE Transactions on Information Technology in Biomedicine.

[277]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[278]  Adam Godzik,et al.  Flexible algorithm for direct multiple alignment of protein structures and sequences , 1994, Comput. Appl. Biosci..

[279]  Christian Schaefer,et al.  Homology-based inference sets the bar high for protein function prediction , 2013, BMC Bioinformatics.

[280]  Shoudan Liang,et al.  Local Network Topology in Human Protein Interaction Data Predicts Functional Association , 2009, PloS one.

[281]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[282]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[283]  J M Gauthier,et al.  Protein--protein interaction maps: a lead towards cellular functions. , 2001, Trends in genetics : TIG.

[284]  William Stafford Noble,et al.  Learning to predict protein-protein interactions from protein sequences , 2003, Bioinform..

[285]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[286]  Jugal K. Kalita,et al.  A new approach for clustering gene expression time series data , 2009, Int. J. Bioinform. Res. Appl..

[287]  See-Kiong Ng,et al.  On combining multiple microarray studies for improved functional classification by whole-dataset feature selection. , 2003, Genome informatics. International Conference on Genome Informatics.

[288]  E. Sprinzak,et al.  Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. , 1999, Genome research.

[289]  Paulo J. G. Lisboa,et al.  Finding reproducible cluster partitions for the k-means algorithm , 2013, BMC Bioinformatics.

[290]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[291]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[292]  E. Lander,et al.  Development and Applications of CRISPR-Cas9 for Genome Engineering , 2014, Cell.

[293]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[294]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[295]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[296]  Erik L. L. Sonnhammer,et al.  Predicting protein function from domain content , 2008, Bioinform..

[297]  Peter Uetz,et al.  MPIDB: the microbial protein interaction database , 2008, Bioinform..

[298]  Alioune Ngom,et al.  Fast Protein Superfamily Classification Using Principal Component Null Space Analysis , 2005, Canadian Conference on AI.

[299]  Jean-Michel Claverie,et al.  Phydbac (phylogenomic display of bacterial genes): an interactive resource for the annotation of bacterial genomes , 2003, Nucleic Acids Res..

[300]  Jan Komorowski,et al.  Predicting Gene Function from Gene Expressions and Ontologies , 2000, Pacific Symposium on Biocomputing.

[301]  Jason Weston,et al.  Learning Gene Functional Classifications from Multiple Data Types , 2002, J. Comput. Biol..

[302]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[303]  Walter R. Gilks,et al.  Probabilistic annotation of protein sequences based on functional classifications , 2005, BMC Bioinformatics.

[304]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[305]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[306]  Qicheng Ma,et al.  Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks , 2005, BMC Bioinformatics.

[307]  Ting Chen,et al.  Integrative approaches for predicting protein function and prioritizing genes for complex phenotypes using protein interaction networks , 2014, Briefings Bioinform..

[308]  Peter D. Karp,et al.  A systematic study of genome context methods: calibration, normalization and combination , 2010, BMC Bioinformatics.

[309]  Chien-Yu Chen,et al.  Exploiting homogeneity in protein sequence clusters for construction of protein family hierarchies , 2006, Pattern Recognit..

[310]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[311]  M. Gerstein,et al.  The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. , 1999, Journal of molecular biology.

[312]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[313]  Avi Shoshan,et al.  Large-scale protein annotation through gene ontology. , 2002, Genome research.

[314]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[315]  Xiaolong Wang,et al.  A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis , 2008, BMC Bioinformatics.

[316]  D. Kihara,et al.  PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data , 2009, Proteins.

[317]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[318]  Daisuke Kihara,et al.  Enhanced automated function prediction using distantly related sequences and contextual association by PFP , 2006, Protein science : a publication of the Protein Society.

[319]  Marinka Zitnik,et al.  Data Fusion by Matrix Factorization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[320]  Cathy H. Wu,et al.  Protein classification using a neural network database system , 1991, ANNA '91.

[321]  S. Brenner,et al.  Expectations from structural genomics , 2008, Protein science : a publication of the Protein Society.

[322]  Robert B Russell,et al.  A model for statistical significance of local similarities in structure. , 2003, Journal of molecular biology.

[323]  Hsiang-Yuan Yeh,et al.  Inferring drug-disease associations from integration of chemical, genomic and phenotype data using network propagation , 2013, BMC Medical Genomics.

[324]  I. Uchiyama Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes , 2006, Nucleic acids research.

[325]  Tetsushi Sakuma,et al.  Precise Correction of the Dystrophin Gene in Duchenne Muscular Dystrophy Patient Induced Pluripotent Stem Cells by TALEN and CRISPR-Cas9 , 2014, Stem cell reports.

[326]  Kunchur Guruprasad,et al.  Database of Structural Motifs in Proteins , 2000, Bioinform..

[327]  P. Bork,et al.  Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs , 2004, Nature Biotechnology.

[328]  Boris Hayete,et al.  GOTrees: Predicting GO Associations from Protein Domain Composition Using Decision Trees , 2004, Pacific Symposium on Biocomputing.

[329]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[330]  Blatt,et al.  Superparamagnetic clustering of data. , 1998, Physical review letters.

[331]  Ankit Malhotra,et al.  Efficient CRISPR/Cas9-Mediated Genome Editing in Mice by Zygote Electroporation of Nuclease , 2015, Genetics.

[332]  Alfonso Valencia,et al.  Automatic annotation of protein function based on family identification , 2003, Proteins.

[333]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[334]  Asa Ben-Hur,et al.  Hierarchical Classification of Gene Ontology Terms Using the Gostruct Method , 2010, J. Bioinform. Comput. Biol..

[335]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[336]  Paolo Fontana,et al.  Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms , 2012, BMC Bioinformatics.

[337]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[338]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[339]  Raymond K. Auerbach,et al.  The real cost of sequencing: higher than you think! , 2011, Genome Biology.

[340]  M. Gerstein,et al.  Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. , 2002, Genome research.

[341]  Stefano Toppo,et al.  Enhancing protein function prediction with taxonomic constraints--The Argot2.5 web server. , 2016, Methods.

[342]  R. Albert Network Inference, Analysis, and Modeling in Systems Biology , 2007, The Plant Cell Online.

[343]  Ting Chen,et al.  Mapping gene ontology to proteins based on protein-protein interaction data , 2004, Bioinform..

[344]  Nathan Linial,et al.  ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree , 2011, Nucleic Acids Res..

[345]  H. M. Marvin,et al.  Heart Disease , 1854, Hall's journal of health.

[346]  Peter D. Karp,et al.  EcoCyc: a comprehensive database resource for Escherichia coli , 2004, Nucleic Acids Res..

[347]  M Ouali,et al.  Cascaded multiple classifiers for secondary structure prediction , 2000, Protein science : a publication of the Protein Society.

[348]  Renzhi Cao,et al.  Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. , 2016, Methods.

[349]  Andrew J. Martin,et al.  The ups and downs of protein topology; rapid comparison of protein structure. , 2000, Protein engineering.

[350]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[351]  P. Radivojac,et al.  Analysis of protein function and its prediction from amino acid sequence , 2011, Proteins.

[352]  Simon Kasif,et al.  Genomic functional annotation using co-evolution profiles of gene clusters , 2002, Genome Biology.

[353]  Giorgio Valentini,et al.  UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions , 2015, J. Comput. Biol..

[354]  Duane Szafron,et al.  Improving Protein Function Prediction using the Hierarchical Structure of the Gene Ontology , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[355]  John H. Morris,et al.  Improving the quality of protein similarity network clustering algorithms using the network edge weight distribution , 2011, Bioinform..

[356]  George Church,et al.  Titin mutations in iPS cells define sarcomere insufficiency as a cause of dilated cardiomyopathy , 2015, Science.

[357]  J. Schug,et al.  Predicting gene ontology functions from ProDom and CDD protein domains. , 2002, Genome research.

[358]  Carlos Prieto,et al.  APID: Agile Protein Interaction DataAnalyzer , 2006, Nucleic Acids Res..

[359]  Joël Pothier,et al.  YAKUSA: A fast structural database scanning method , 2005, Proteins.

[360]  Dmitrij Frishman,et al.  SNAPper: gene order predicts gene function , 2002, Bioinform..

[361]  Sung-Hou Kim,et al.  Global mapping of the protein structure space and application in structure-based inference of protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[362]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[363]  Andrea Califano,et al.  SPLASH: structural pattern localization analysis by sequential histograms , 2000, Bioinform..

[364]  Simon Kasif,et al.  Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data , 2007, PloS one.

[365]  W. Wong,et al.  Transitive functional annotation by shortest-path analysis of gene expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[366]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[367]  Reinhard Guthke,et al.  Gene Expression Data Mining for Functional Genomics , 2001 .

[368]  Geoffrey J. Barton,et al.  GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes , 2004, BMC Bioinformatics.

[369]  O. Wolkenhauer,et al.  Clustering of Gene Expression Time-Series Data , 2003 .

[370]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[371]  Anushya Muruganujan,et al.  Large-scale gene function analysis with the PANTHER classification system , 2013, Nature Protocols.

[372]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[373]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[374]  Jeffrey T. Chang,et al.  Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. , 2002, Genome research.

[375]  Sándor Pongor,et al.  The SBASE protein domain library, release 9.0: an online resource for protein domain identification , 2002, Nucleic Acids Res..

[376]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[377]  Weidong Tian,et al.  GoFDR: A sequence alignment based method for predicting protein functions. , 2016, Methods.

[378]  A Aszódi,et al.  High-throughput functional annotation of novel gene products using document clustering. , 2000, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[379]  George M. Whitson,et al.  PROCANS: a protein classification system using a neural network , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[380]  Michal Linial,et al.  ProFET: Feature engineering captures high-level protein functions , 2015, Bioinform..

[381]  Rumen Andonov,et al.  Algorithm engineering for optimal alignment of protein structure distance matrices , 2011, Optim. Lett..

[382]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[383]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[384]  J. Skolnick,et al.  Access the most recent version at doi: 10.1110/ps.49201 References , 2000 .

[385]  Mark D'Souza,et al.  Use of contiguity on the chromosome to predict functional coupling , 1998, Silico Biol..

[386]  Inbal Budowski-Tal,et al.  FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately , 2010, Proceedings of the National Academy of Sciences.

[387]  Jonathan Qiang Jiang,et al.  Learning Protein Functions from Bi-relational Graph of Proteins and Function Annotations , 2011, WABI.

[388]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[389]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[390]  Y. Freund,et al.  Profile-based string kernels for remote homology detection and motif extraction. , 2005, Journal of bioinformatics and computational biology.

[391]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[392]  P. Bork,et al.  Protein sequence motifs. , 1996, Current opinion in structural biology.

[393]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[394]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[395]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[396]  Jeffry D. Sander,et al.  CRISPR-Cas systems for editing, regulating and targeting genomes , 2014, Nature Biotechnology.

[397]  O. Troyanskaya,et al.  Predicting gene function in a hierarchical context with an ensemble of classifiers , 2008, Genome Biology.

[398]  G. Thode,et al.  Search for ancient patterns in protein sequences , 1996, Journal of Molecular Evolution.

[399]  Michael J. E. Sternberg,et al.  ConFunc - functional annotation in the twilight zone , 2008, Bioinform..