New Genomic Information Systems (GenISs): Species Delimitation and IDentification

Genomic Information Systems (GenISs) have been recently proposed to provide a universal framework for feature extraction, dimensionality reduction and more effective processing of genomic data. They are based on methodologies more anchored in biochemical reality and exploit newly discovered structure of DNA spaces to extract and represent genomic data in compact data structures rich enough to answer critical questions about the original organisms, including phylogenies, species identification and, more recently, phenotypic information. They work from just DNA sequence alone (possibly including full genomes), in a matter of minutes or hours, and produce answers consistent with well-established and accepted biological knowledge. Here, we introduce a second family of GenISs based on further structural properties of DNA spaces and demonstrate that they could also be used to provide principled, general and intuitive solutions to fundamental questions in biology such as “What exactly is a biological species?” Current answers to these all important questions have remained dependent on specific taxa and subject to analyst choices. We further discuss other applications to be explored in the future, including universal biological taxonomies in the quest for a truly universal and comprehensive “Atlas of Life”, as it is or as it could be on earth.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Robert R. Sokal,et al.  The Biological Species Concept: A Critical Evaluation , 1970, The American Naturalist.

[3]  Jeremy R. deWaard,et al.  Biological identifications through DNA barcodes , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[4]  C. Woese,et al.  Phylogenetic structure of the prokaryotic domain: The primary kingdoms , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[5]  K. Queiroz**,et al.  Ernst Mayr and the modern concept of species , 2005 .

[6]  Kevin de Queiroz,et al.  Species Concepts and Species Delimitation , 2007 .

[7]  E. Mayr Systematics and the Origin of Species , 1942 .

[8]  Alice C. McHardy,et al.  From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer , 2016, mSystems.

[9]  N. Seeman,et al.  Design and self-assembly of two-dimensional DNA crystals , 1998, Nature.

[10]  A. Maki,et al.  Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks , 2019, Systematic biology.

[11]  R. Mott,et al.  The 1001 Genomes Project for Arabidopsis thaliana , 2009, Genome Biology.

[12]  L M Adleman,et al.  Molecular computation of solutions to combinatorial problems. , 1994, Science.

[13]  Max H. Garzon,et al.  Towards a Universal Genomic Positioning System: Phylogenetics and Species IDentification , 2017, IWBBIO.

[14]  Max H. Garzon,et al.  DNA Codeword Design: Theory and Applications , 2014, Parallel Process. Lett..

[15]  L. V. Valen,et al.  Ecological Species, Multispecies, and Oaks , 1976 .

[16]  Carl von Linné Systema Naturae: Per Regna Tria Naturae, Secundum Classes, Ordines, Genera, Species, Cum Characteribus, Differentiis, Synonymis, Locis, , 2011 .

[17]  Jonathan R. Karr,et al.  A Whole-Cell Computational Model Predicts Phenotype from Genotype , 2012, Cell.

[18]  Max H. Garzon,et al.  A Geometric Approach to Gibbs Energy Landscapes and Optimal DNA Codeword Design , 2012, DNA.

[19]  Max H. Garzon,et al.  Genomic Solutions to Hospital-Acquired Bacterial Infection Identification , 2018, IWBBIO.

[20]  N. Seeman DNA in a material world , 2003, Nature.

[21]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[22]  Jin Woo Kim,et al.  Test Tube Selection of Large Independent Sets of DNA Oligonucleotides , 2006, Nanotechnology: Science and Computation.

[23]  Max H. Garzon,et al.  Foretelling the Phenotype of a Genomic Sequence , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  F. A. Colorado-Garzon,et al.  Estimating Diversity of Black Flies in the Simulium ignescens and Simulium tunja Complexes in Colombia: Chromosomal Rearrangements as the Core of Integrative Taxonomy , 2017, The Journal of heredity.