Foretelling the Phenotype of a Genomic Sequence

Estimating phenotypic features (physical and biochemical traits) in a biological organism from their genomic sequence alone and/or environmental conditions has major applications in anthropological paleontology and criminal forensics, for example. To what extent do genomic sequences generally and causally determine phenotypic features of organisms, environmental conditions aside? We present results of two studies, one in blackfly (Insecta:Diptera:Simuliidae) larvae in two species (Simulium ignescens and S. tunja) with four phenotypic features, including the area and spot pattern of the cephalic apotome (in the form of a latin cross on the dorsal side of the head), the postgenal cleft (area under the head on the ventral side) and general body color in larva specimens; the second in strains of Arabidopsis thaliana. They establish that a substantial component of these phenotypic features (over 75 percent) are at least logically inferable, if not causally determined, by genomic fragments alone, despite the fact that these phenotypic features are not 100 percent determined entirely by genetic traits. These results suggest that it is possible to infer the genetic contribution in the determination of specific phenotypic features of a biological organism, without recourse to the causal chain of metabolomics and proteomic events leading to them from genomic sequences.

[1]  D. Currie,et al.  Identification of Nearctic black flies using DNA barcodes (Diptera: Simuliidae) , 2009, Molecular ecology resources.

[2]  Max H. Garzon,et al.  DNA Codeword Design: Theory and Applications , 2014, Parallel Process. Lett..

[3]  B. Enquist,et al.  Adaptive diversification of growth allometry in the plant Arabidopsis thaliana , 2018, Proceedings of the National Academy of Sciences.

[4]  Max H. Garzon,et al.  Profiling Environmental Conditions from DNA , 2020, IWBBIO.

[5]  R. Vrijenhoek,et al.  DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. , 1994, Molecular marine biology and biotechnology.

[6]  N. Baeshen,et al.  Biological Identifications Through DNA Barcodes , 2012 .

[7]  A. Cywinska,et al.  Identifying Canadian mosquito species through DNA barcodes , 2006, Medical and veterinary entomology.

[8]  Alice C. McHardy,et al.  From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer , 2016, mSystems.

[9]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1974, Nature.

[10]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[11]  Jonathan R. Karr,et al.  A Whole-Cell Computational Model Predicts Phenotype from Genotype , 2012, Cell.

[12]  Tamás D. Gedeon,et al.  Data Mining of Inputs: Analysing Magnitude and Functional Measures , 1997, Int. J. Neural Syst..

[13]  R. Mott,et al.  The 1001 Genomes Project for Arabidopsis thaliana , 2009, Genome Biology.

[14]  Roberto Fritsche-Neto,et al.  Phenomics , 2015, Springer International Publishing.

[15]  Elizabeth León Guzman,et al.  Self-adaptive Evolutionary Algorithm for DNA Codeword Design , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[16]  Yu Li,et al.  Promoter analysis and prediction in the human genome using sequence-based deep learning models , 2019, Bioinform..

[17]  Lihua Li,et al.  DEEPre: sequence-based enzyme EC number prediction by deep learning , 2017, Bioinform..

[18]  Putative homeodomain proteins identified in prokaryotes based on pattern and sequence similarity. , 2002, Biochemical and biophysical research communications.

[19]  Yu Li,et al.  Deep learning in bioinformatics: introduction, application, and perspective in big data era , 2019, bioRxiv.

[20]  Yu Li,et al.  DeeReCT-PolyA: a robust and generic deep learning method for PAS identification , 2018, Bioinform..

[21]  A. Hoffmann,et al.  DNA identification of urban Tanytarsini chironomids (Diptera:Chironomidae) , 2007, Journal of the North American Benthological Society.

[22]  F. A. Colorado-Garzon,et al.  Estimating Diversity of Black Flies in the Simulium ignescens and Simulium tunja Complexes in Colombia: Chromosomal Rearrangements as the Core of Integrative Taxonomy , 2017, The Journal of heredity.