Low coverage whole genome sequencing reveals the underlying structure of European sardine populations

Whole genome sequence data is an ideal tool for characterizing processes in ecology and evolution. Despite the lowering in sequencing costs, it can be challenging to produce a genome and high-coverage resequencing data for a non-model species. New population genomics data analysis pipelines based on genotype likelihoods allow for a significant reduction in cost by efficiently extracting information from low coverage sequence data. We demonstrate the robustness of such approaches with a genomic data set consisting of two draft genomes of the European sardine (Sardina pilchardus, Walbaum 1792), and resequencing data (~1.5 X depth) for 78 individuals from 12 sampling locations across the 5,000 Km of the species’ distribution range (from the Eastern Mediterranean to the archipelagos of Madeira and Azores). Our results clearly show at least three genetic clusters. One includes individuals from Azores and Madeira (two archipelagos in the Atlantic), the second corresponds to Iberia (the center of the sampling distribution), and the third gathers the Mediterranean samples and those from the Canary Islands. This suggests at least two important barriers to gene flow, even though these do not seem complete, with individuals from Iberia showing some degree of admixture. These results together with the genetic resources generated for this commercially important taxon provide a baseline for further studies aiming at identifying the nature of these barriers between Sardine populations, and information for transnational stock management of this highly exploited species towards sustainable fisheries.