BDBM 1.0: A Desktop Application for Efficient Retrieval and Processing of High-Quality Sequence Data and Application to the Identification of the Putative Coffea S-Locus

Nowadays, bioinformatics is one of the most important areas in modern biology and the creation of high-quality scientific software supporting this recent research area is one of the core activities of many researchers. In this context, high-quality sequence datasets are needed to perform inferences on the evolution of species, genes, and gene families, or to get evidence for adaptive amino acid evolution, among others. Nevertheless, sequence data are very often spread over several databases, many useful genomes and transcriptomes are non-annotated, the available annotation is not for the desired coding sequence isoform, and/or is unlikely to be accurate. Moreover, although the FASTA text-based format is quite simple and usable by most software applications, there are a number of issues that may be critical depending on the software used to analyse such files. Therefore, researchers without training in informatics often use a fraction of all available data. The above issues can be addressed using already available software applications, but there is no easy-to-use single piece of software that allows performing all these tasks within the same graphical interface, such as the one here presented, named BDBM (Blast DataBase Manager). BDBM can be used to efficiently get gene sequences from annotated and non-annotated genomes and transcriptomes. Moreover, it can be used to look for alternatives to existing annotations and to easily create reliable custom databases. Such databases are essential to prepare high-quality datasets. The analyses that we have performed on the Coffea canephora genome using BDBM aimed at the identification of the S-locus region (that harbours the genes involved in gametophytic self-incompatibility) led to the conclusion that there are two likely regions, one on chromosome 2 (around region 6600000–6650000), and another on chromosome 5 (around 15830000–15930000). Such findings are discussed in the context of the Rubiaceae gametophytic self-incompatibility evolution.

[1]  A. Iezzoni,et al.  The S-RNase-based gametophytic self-incompatibility system in Prunus exhibits distinct genetic and molecular features , 2010 .

[2]  Alexander Souvorov,et al.  Splign: algorithms for computing spliced alignments with identification of paralogs , 2008, Biology Direct.

[3]  Nuno A. Fonseca,et al.  ADOPS - Automatic Detection Of Positively Selected Sites , 2012, Journal of integrative bioinformatics.

[4]  B. Igić,et al.  The evolutionary history of plant T2/S-type ribonucleases , 2017, PeerJ.

[5]  Cristina P. Vieira,et al.  Different Positively Selected Sites at the Gametophytic Self-Incompatibility Pistil S-RNase Gene in the Solanaceae and Rosaceae (Prunus, Pyrus, and Malus) , 2007, Journal of Molecular Evolution.

[6]  Jorge Vieira,et al.  Variability patterns and positively selected sites at the gametophytic self-incompatibility pollen SFB gene in a wild self-incompatible Prunus spinosa (Rosaceae) population. , 2006, The New phytologist.

[7]  Giulia Antonazzo,et al.  FlyBase: establishing a Gene Group resource for Drosophila melanogaster , 2015, Nucleic Acids Res..

[8]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[9]  A. McCubbin,et al.  S-RNases and sexual incompatibility: structure, functions, and evolutionary perspectives. , 2003, Molecular phylogenetics and evolution.

[10]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[11]  Jayarama,et al.  The coffee genome provides insight into the convergent evolution of caffeine biosynthesis , 2014, Science.

[12]  Byron Gallis,et al.  Comparison of Francisella tularensis genomes reveals evolutionary events associated with the emergence of human pathogenic strains , 2007, Genome Biology.

[13]  K. Shimizu,et al.  Gene duplication and genetic exchange drive the evolution of S-RNase-based self-incompatibility in Petunia , 2015, Nature Plants.

[14]  Dmitri Petrov,et al.  High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes. , 2011, Genome research.

[15]  M. J. Lawrence,et al.  Cloning and expression of a distinctive class of self-incompatibility (S) gene from Papaver rhoeas L. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Nuno A. Fonseca,et al.  Convergent Evolution at the Gametophytic Self-Incompatibility System in Malus and Prunus , 2015, PloS one.

[17]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[18]  K. Tobutt,et al.  Loss of Pollen-S Function in Two Self-Compatible Selections of Prunus avium Is Associated with Deletion/Mutation of an S Haplotype–Specific F-Box Gene , 2005, The Plant Cell Online.

[19]  E. Newbigin,et al.  Expression of 10 S-Class SLF-like Genes in Nicotiana alata Pollen and Its Implications for Understanding the Pollen Factor of the S Locus , 2007, Genetics.

[20]  T. Kao,et al.  Comparison of Petunia inflata S-Locus F-Box Protein (Pi SLF) with Pi SLF–Like Proteins Reveals Its Unique Function in S-RNase–Based Self-Incompatibility[W] , 2007, The Plant Cell Online.

[21]  Ka Yee Yeung,et al.  GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research , 2016, PloS one.

[22]  Alexander Sczyrba,et al.  Bioboxes: standardised containers for interchangeable bioinformatics software , 2015, GigaScience.

[23]  D. Luu,et al.  Rejection of S-heteroallelic pollen by a dual-specific s-RNase in Solanum chacoense predicts a multimeric SI pollen component. , 2001, Genetics.

[24]  Yi Pan,et al.  Multiple sequence alignment based on dynamic weighted guidance tree , 2011, Int. J. Bioinform. Res. Appl..

[25]  M. Gerdol,et al.  S-RNase-like Sequences in Styles of Coffea (Rubiaceae). Evidence for S-RNase Based Gametophytic Self-Incompatibility? , 2011, Tropical Plant Biology.

[26]  M. Lorieux,et al.  Genetic linkage map of Coffea canephora: effect of segregation distortion and analysis of recombination rate in male and female meioses. , 2001, Genome.

[27]  J. Vieira,et al.  Genetic and molecular characterization of three novel S-haplotypes in sour cherry (Prunus cerasus L.) , 2008, Journal of experimental botany.

[28]  P. Lashermes,et al.  Inheritance and genetic mapping of self-incompatibility in Coffea canephora Pierre , 1996, Theoretical and Applied Genetics.

[29]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[30]  M. J. Lawrence,et al.  Molecular analysis of two functional homologues of the S3 allele of the Papaver rhoeas self-incompatibility gene isolated from different populations , 1996, Plant Molecular Biology.

[31]  Sanhong Wang,et al.  Apple S locus region represents a large cluster of related, polymorphic and pollen-specific F-box genes , 2010, Plant Molecular Biology.

[32]  Boris Igic,et al.  Evolutionary relationships among self-incompatibility RNases , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Valentin Guignon,et al.  The coffee genome hub: a resource for coffee genomes , 2014, Nucleic Acids Res..

[34]  T. Kao,et al.  Collaborative Non-Self Recognition System in S-RNase–Based Self-Incompatibility , 2010, Science.

[35]  Nuno A. Fonseca,et al.  Inferences on specificity recognition at the Malus×domestica gametophytic self-incompatibility system , 2018, Scientific Reports.

[36]  K. Okada,et al.  Related polymorphic F-box protein genes between haplotypes clustering in the BAC contig sequences around the S-RNase of Japanese pear , 2010, Journal of experimental botany.

[37]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[38]  Florentino Fernández Riverola,et al.  A Bioinformatics Protocol for Quickly Creating Large-Scale Phylogenetic Trees , 2018, PACBB.

[39]  Nuno A. Fonseca,et al.  Evolutionary patterns at the RNase based gametophytic self - incompatibility system in two divergent Rosaceae groups (Maloideae and Prunus) , 2010, BMC Evolutionary Biology.

[40]  Michael D. Nowak,et al.  Expression and Trans-Specific Polymorphism of Self-Incompatibility RNases in Coffea (Rubiaceae) , 2011, PloS one.

[41]  Nuno A. Fonseca,et al.  Patterns of evolution at the gametophytic self-incompatibility Sorbus aucuparia (Pyrinae) S pollen genes support the non-self recognition by multiple factors model , 2013, Journal of experimental botany.

[42]  J. Vieira,et al.  Drosophila Genes That Affect Meiosis Duration Are among the Meiosis Related Genes That Are More Often Found Duplicated , 2011, PloS one.

[43]  Joshua M. Stuart,et al.  Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies , 2011, Nucleic acids research.

[44]  C. dePamphilis,et al.  Transcriptome Analysis Reveals the Same 17 S-Locus F-Box Genes in Two Haplotypes of the Self-Incompatibility Locus of Petunia inflata[W] , 2014, Plant Cell.

[45]  Nuno A. Fonseca,et al.  An S-RNase-Based Gametophytic Self-Incompatibility System Evolved Only Once in Eudicots , 2008, Journal of Molecular Evolution.

[46]  K. Holsinger,et al.  S-RNase-mediated gametophytic self-incompatibility is ancestral in eudicots. , 2002, Molecular biology and evolution.

[47]  Harald Barsnes,et al.  BioContainers: an open-source and community-driven framework for software standardization , 2017, Bioinform..

[48]  H. Hirano,et al.  S Locus F-Box Brothers: Multiple and Pollen-Specific F-Box Genes With S Haplotype-Specific Polymorphisms in Apple and Japanese Pear , 2007, Genetics.

[49]  S. Salzberg Genome re-annotation: a wiki solution? , 2007, Genome Biology.

[50]  C. H. T. M. Conagin,et al.  Pesquisas citológicas e genéticas em três espécies de Coffea: auto-incompatibilidade em Coffea canephora pierre ex froehner , 1961 .

[51]  Christos A. Ouzounis,et al.  Annotation inconsistencies beyond sequence similarity-based function prediction – phylogeny and genome structure , 2015, Standards in Genomic Sciences.

[52]  T. Kao,et al.  Identification and Characterization of Components of a Putative Petunia S-Locus F-Box–Containing E3 Ligase Complex Involved in S-RNase–Based Self-Incompatibility[W] , 2006, The Plant Cell Online.