Data compression can discriminate broilers by selection line, detect haplotypes, and estimate genetic potential for complex phenotypes

ABSTRACT Accurately establishing the relationships among individuals lays the foundation for genetic analyses such as genome‐wide association studies and identification of selection signatures. Of particular interest to the poultry industry are estimates of genetic merit based on molecular data. These estimates can be commercially exploited in marker‐assisted breeding programs to accelerate genetic improvement. Here, we test the utility of a new method we have recently developed to estimate animal relatedness and applied it to genetic parameter estimation in commercial broilers. Our approach is based on the concept of data compression from information theory. Using the real‐world compressor gzip to estimate normalized compression distance (NCD) we have built compression‐based relationship matrices (CRM) for 988 chickens from 4 commercial broiler lines—2 male and 2 female lines. For all pairs of individuals, we found a strong negative relationship between the commonly used genomic relationship matrix (GRM) and NCD. This reflects the fact that “similarity” is the inverse of “distance.” The CRM explained more genetic variation than the corresponding GRM in 2 of 3 phenotypes, with corresponding improvements in accuracy of genomic‐enabled predictions of breeding value. A sliding‐window version of the analysis highlighted haplotype regions of the genome apparently under selection in a line‐specific manner. In the male lines, we retrieved high population‐specific scores for IGF‐1 and a cognate receptor, INSR. For the female lines, we detected an extreme score for a region containing a reproductive hormone receptor (GNRHR). We conclude that our compression‐based method is a valid approach to established relationships and identify regions under selective pressure in commercial lines of broiler chickens.

[1]  Hans H. Cheng,et al.  A high-density SNP-based linkage map of the chicken genome reveals sequence features correlated with recombination rate. , 2009, Genome research.

[2]  Miguel Pérez-Enciso,et al.  Qxpak.5: Old mixed model solutions for new genomics problems , 2011, BMC Bioinformatics.

[3]  Antonio Reverter,et al.  Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest , 2014, BMC Bioinformatics.

[4]  A. Sánchez,et al.  Identification of three single nucleotide polymorphisms in the chicken insulin-like growth factor 1 and 2 genes and their associations with growth and feeding traits. , 2003, Poultry science.

[5]  E. Barton,et al.  The ABCs of IGF-I isoforms: impact on muscle hypertrophy and implications for repair. , 2006, Applied physiology, nutrition, and metabolism = Physiologie appliquee, nutrition et metabolisme.

[6]  C. Ashwell,et al.  Insulin-like growth factor-I gene polymorphism associations with growth, body composition, skeleton integrity, and metabolic traits in chickens. , 2005, Poultry science.

[7]  A. Reverter,et al.  Compression distance can discriminate animals by genetic profile, build relationship matrices and estimate breeding values , 2015, Genetics Selection Evolution.

[8]  John J. Stainton,et al.  Detecting signatures of selection in nine distinct lines of broiler chickens. , 2015, Animal genetics.

[9]  Sewall Wright,et al.  Coefficients of Inbreeding and Relationship , 1922, The American Naturalist.

[10]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.

[11]  Ming Li,et al.  Clustering by compression , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..