Getting Started in Gene Orthology and Functional Analysis

selection pressure on a gene, revealed from its evolutionary history, is determined by the role played by the gene, i.e., its biological function. The known conservation of a gene’s sequence coupled with the knowledge of the timing/dating of evolutionary events provides clues about the gene’s function. If a gene is preserved in all species with high sequence similarity and there are only a few duplication events along its evolutionary history, we have high confidence that its orthologs have the same function in different species. On the other hand, a large number of duplications and/or deletions along a gene’s evolutionary history could indicate neofunctionalization and/or nonorthologous gene displacement [3], and consequently, orthologs in different genomes may have different functions. These facts highlight the significance of functionoriented ortholog identification. In this article, we will review the general procedures to identify orthologs and make ortholog groups. We will focus on the functional analyses of orthologs, review previous work to assess functional consistency of orthologs, and make suggestions to construct better ortholog groups. Lastly, because orthologs can only be identified when the whole gene inventories from all the involved species are examined, the distribution of identified orthologs among species is an immediate result of looking into the composition of ortholog groups. Composition of ortholog groups, which bears important information for downstream research and applications, will also be briefly discussed.

[1]  Leonid Peshkin,et al.  Roundup: a multi-genome repository of orthologs and evolutionary distances , 2006, Bioinform..

[2]  M. Huynen,et al.  Benchmarking ortholog identification methods using functional genomics data , 2006, Genome Biology.

[3]  M. Lynch,et al.  The Origins of Genome Complexity , 2003, Science.

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  M. Gerstein,et al.  Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. , 2004, Genome research.

[6]  T. Wdowiak,et al.  Laser–Raman imagery of Earth's earliest fossils , 2002, Nature.

[7]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[8]  Mark Gerstein,et al.  Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications , 2007, Bioinform..

[9]  D. P. Wall,et al.  Detecting putative orthologs , 2003, Bioinform..

[10]  Haiyuan Yu,et al.  Developing a similarity measure in biological function space , 2007 .

[11]  N. Friedman,et al.  Natural history and evolutionary principles of gene duplication in fungi , 2007, Nature.

[12]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[13]  Peer Bork,et al.  SMART 6: recent updates and new developments , 2008, Nucleic Acids Res..

[14]  Sourav Bandyopadhyay,et al.  Systematic identification of functional orthologs based on protein network comparison. , 2006, Genome research.

[15]  Guy Perrière,et al.  Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases , 2005, Bioinform..

[16]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[17]  Antoine Danchin,et al.  Persistence drives gene clustering in bacterial genomes , 2008, BMC Genomics.

[18]  Mark Gerstein,et al.  Toward a systematic definition of protein function that scales to the genome level: defining function in terms of interactions , 2002, Proc. IEEE.

[19]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[20]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[21]  T. Gabaldón Large-scale assignment of orthology: back to phylogenetics? , 2008, Genome Biology.

[22]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[23]  E. Rocha Is there a role for replication fork asymmetry in the distribution of genes in bacterial genomes? , 2002, Trends in microbiology.

[24]  Christian von Mering,et al.  eggNOG: automated construction and annotation of orthologous groups of genes , 2007, Nucleic Acids Res..

[25]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[26]  A. Force,et al.  The probability of duplicate gene preservation by subfunctionalization. , 2000, Genetics.

[27]  Joaquín Dopazo,et al.  PhylomeDB: a database for genome-wide collections of gene phylogenies , 2007, Nucleic Acids Res..

[28]  Gaston H. Gonnet,et al.  OMA Browser - Exploring orthologous relations across 352 complete genomes , 2007, Bioinform..

[29]  T. Johnson Reciprocal best hits are not a logically sufficient condition for orthology , 2007, 0706.0117.

[30]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[31]  Antoine Danchin,et al.  How essential are nonessential genes? , 2005, Molecular biology and evolution.

[32]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[33]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[34]  P. Bork,et al.  Non-orthologous gene displacement. , 1996, Trends in genetics : TIG.

[35]  Ron D. Appel,et al.  ExPASy: the proteomics server for in-depth protein knowledge and analysis , 2003, Nucleic Acids Res..

[36]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[37]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[38]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[39]  M. Ruggero,et al.  Similarity of Traveling-Wave Delays in the Hearing Organs of Humans and Other Tetrapods , 2007, Journal for the Association for Research in Otolaryngology.