Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase)

BackgroundAnnotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families.DescriptionPfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content.ConclusionWe implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein.

[1]  Kenneth C Holmes,et al.  The molecular mechanism of muscle contraction. , 2005, Advances in protein chemistry.

[2]  Nobutaka Hirokawa,et al.  Molecular motors and mechanisms of directional transport in neurons , 2005, Nature Reviews Neuroscience.

[3]  J. Scholey,et al.  Cell division , 2003, Nature.

[4]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information , 2021, Nucleic Acids Res..

[5]  Lincoln Stein,et al.  Genome annotation: from sequence to biology , 2001, Nature Reviews Genetics.

[6]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[7]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[8]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[9]  S. Lewis,et al.  Genome annotation assessment in Drosophila melanogaster. , 2000, Genome research.

[10]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[11]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[12]  Y. Hotta,et al.  Cell Division , 2021, Nature.

[13]  Michael Bächle,et al.  Ruby on Rails , 2006, Softwaretechnik-Trends.

[14]  Ronald D Vale,et al.  The Molecular Motor Toolbox for Intracellular Transport , 2003, Cell.

[15]  Martin Fowler,et al.  Patterns of Enterprise Application Architecture , 2002 .