Extending GelJ for interoperability: Filling the gap in the bioinformatics resources for population genetics analysis with dominant markers

BACKGROUND AND OBJECTIVE The manual transformation of DNA fingerprints of dominant markers into the input of tools for population genetics analysis is a time-consuming and error-prone task; especially when the researcher deals with a large number of samples. In addition, when the researcher needs to use several tools for population genetics analysis, the situation worsens due to the incompatibility of data-formats across tools. The goal of this work consists in automating, from banding patterns of gel images, the input-generation for the great diversity of tools devoted to population genetics analysis. METHODS After a thorough analysis of tools for population genetics analysis with dominant markers, and tools for working with phylogenetic trees; we have detected the input requirements of those systems. In the case of programs devoted to phylogenetic trees, the Newick and Nexus formats are widely employed; whereas, each population genetics analysis tool uses its own specific format. In order to handle such a diversity of formats in the latter case, we have developed a new XML format, called PopXML, that takes into account the variety of information required by each population genetics analysis tool. Moreover, the acquired knowledge has been incorporated into the pipeline of the GelJ system - a tool for analysing DNA fingerprint gel images - to reach our automatisation goal. RESULTS We have implemented, in the GelJ system, a pipeline that automatically generates, from gel banding patterns, the input of tools for population genetics analysis and phylogenetic trees. Such a pipeline has been employed to successfully generate, from thousands of banding patterns, the input of 29 population genetics analysis tools and 32 tools for managing phylogenetic trees. CONCLUSIONS GelJ has become the first tool that fills the gap between gel image processing software and population genetics analysis with dominant markers, phylogenetic reconstruction, and tree editing software. This has been achieved by automating the process of generating the input for the latter software from gel banding patterns processed by GelJ.

[1]  P. Smouse,et al.  genalex 6: genetic analysis in Excel. Population genetic software for teaching and research , 2006 .

[2]  J. Tohme,et al.  Use of AFLP markers in surveys of plant diversity. , 2005, Methods in enzymology.

[3]  Genetic diversity and structure of teak (Tectona grandis L. f.) and dahat (Tectona hamiltoniana Wall.) based on chloroplast microsatellites and Amplified Fragment Length Polymorphism markers , 2016, Genetic Resources and Crop Evolution.

[4]  B. Letcher,et al.  create: a software to create input files from diploid genotypic data for 52 genetic software programs , 2008, Molecular ecology resources.

[5]  Christian T. K.-H. Stadtländer,et al.  Molecular Identification, Systematics, and Population Structure of Prokaryotes , 2007 .

[6]  Laurent Excoffier,et al.  PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs , 2012, Bioinform..

[7]  Anneke T. M. Goossen-Baremans,et al.  Detailed Clinical Models: A Review , 2010, Healthcare informatics research.

[8]  L. Excoffier,et al.  Computer programs for population genetics data analysis: a survival guide , 2006, Nature Reviews Genetics.

[9]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[10]  J. Glaubitz convert: A user‐friendly program to reformat diploid genotypic data for commonly used population genetic software packages , 2004 .

[11]  D. Ehrich aflpdat: a collection of r functions for convenient handling of AFLP data , 2006 .

[12]  Chris F. Taylor,et al.  Data standards for Omics data: the basis of data sharing and reuse. , 2011, Methods in molecular biology.

[13]  Genotypic analysis and population structure of Lebanon oak (Quercus libani G. Olivier) with molecular markers , 2015, Tree Genetics & Genomes.

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  Ana Brândusa Pavel,et al.  PyElph - a software tool for gel images analysis and phylogenetics , 2012, BMC Bioinformatics.

[16]  Mrinalini,et al.  Convergence of multiple markers and analysis methods defines the genetic distinctiveness of cryptic pitvipers. , 2015, Molecular phylogenetics and evolution.

[17]  S. Manel,et al.  Statistical analysis of amplified fragment length polymorphism data: a toolbox for molecular ecologists and evolutionists , 2007, Molecular ecology.

[18]  Sarah M. Greene,et al.  Bioinformatics: Tools to accelerate population science and disease control research. , 2010, American journal of preventive medicine.

[19]  FORMATOMATIC: a program for converting diploid allelic data between common formats for population genetic analysis. , 2007, Molecular ecology notes.

[20]  S. Harris,et al.  Analysis of multilocus fingerprinting data sets containing missing data , 2006 .

[21]  Awanish Kumar,et al.  Amplified fragment length polymorphism: an adept technique for genome mapping, genetic differentiation, and intraspecific variation in protozoan parasites , 2012, Parasitology Research.

[22]  César Domínguez,et al.  GelJ – a tool for analyzing DNA fingerprint gel images , 2015, BMC Bioinformatics.

[23]  J. Caujapé‐Castells,et al.  Transformer‐4 version 2.0.1, a free multi‐platform software to quickly reformat genotype matrices of any marker type, and archive them in the Demiurge information system , 2013, Molecular ecology resources.

[24]  Rod Peakall,et al.  GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update , 2012, Bioinform..

[25]  César Domínguez,et al.  Surveying and benchmarking techniques to analyse DNA gel fingerprint images , 2015, Briefings Bioinform..

[26]  César Domínguez,et al.  A survey of tools for analysing DNA fingerprints , 2016, Briefings Bioinform..

[27]  Kevin Williams,et al.  Professional XML , 2001 .

[28]  Georgios A. Pavlopoulos,et al.  A reference guide for tree analysis and visualization , 2010, BioData Mining.