Machine Learning as an Effective Method for Identifying True Single Nucleotide Polymorphisms in Polyploid Plants

Finding reliable SNPs in polyploids is challenging Machine learning is an efficient tool to refine SNP calling from NGS data of polyploids SNP‐ML tool was designed to facilitate SNP calling

[1]  Justin N. Vaughn,et al.  Genome-wide SNP Genotyping Resolves Signatures of Selection and Tetrasomic Recombination in Peanut , 2017, Molecular Plant.

[2]  Per Unneberg,et al.  SNP discovery using advanced algorithms and neural networks , 2005, Bioinform..

[3]  Hadi Quesneville,et al.  Structural and functional partitioning of bread wheat chromosome 3B , 2014, Science.

[4]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[5]  Yulin Chen,et al.  Construction of a SNP-based genetic linkage map in cultivated peanut based on large scale marker development using next-generation double-digest restriction-site-associated DNA sequencing (ddRADseq) , 2014, BMC Genomics.

[6]  Eghbal G. Mansoori,et al.  Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification , 2016, Comput. Biol. Chem..

[7]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[8]  R. Penmetsa,et al.  Genome-wide polymorphism detection in peanut using next-generation restriction-site-associated DNA (RAD) sequencing , 2015, Molecular breeding.

[9]  Henry D. Priest,et al.  The genome of woodland strawberry (Fragaria vesca) , 2011, Nature Genetics.

[10]  Jianping Wang,et al.  Molecular marker development from transcript sequences and germplasm evaluation for cultivated peanut (Arachis hypogaea L.) , 2015, Zeitschrift für Induktive Abstammungs- und Vererbungslehre.

[11]  N. Sugimoto,et al.  Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. , 1996, Nucleic acids research.

[12]  Don C. Jones,et al.  Development of a 63K SNP Array for Cotton and High-Density Mapping of Intraspecific and Interspecific Populations of Gossypium spp. , 2015, G3: Genes, Genomes, Genetics.

[13]  P. Ozias‐Akins,et al.  SWEEP: A Tool for Filtering High-Quality SNPs in Polyploid Crops , 2015, G3: Genes, Genomes, Genetics.

[14]  Luca Bianco,et al.  Development and preliminary evaluation of a 90 K Axiom® SNP array for the allo-octoploid cultivated strawberry Fragaria × ananassa , 2015, BMC Genomics.

[15]  Wei Huang,et al.  The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut , 2016, Nature Genetics.

[16]  S. Jackson,et al.  Single Nucleotide Polymorphism Identification in Polyploids: A Review, Example, and Recommendations. , 2015, Molecular plant.

[17]  C. Simpson,et al.  RFLP variability in peanut (Arachis hypogaea L.) cultivars and wild species , 1991, Theoretical and Applied Genetics.

[18]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[19]  Abiel Roche-Lima,et al.  Implementation and comparison of kernel-based learning methods to predict metabolic networks , 2016, Network Modeling Analysis in Health Informatics and Bioinformatics.

[20]  S. Kuhara,et al.  Dissection of the Octoploid Strawberry Genome by Deep Sequencing of the Genomes of Fragaria Species , 2013, DNA research : an international journal for rapid publication of reports on genes and genomes.

[21]  S. Jackson,et al.  Haplotype-Based Genotyping in Polyploids , 2018, Front. Plant Sci..

[22]  C. Holbrook,et al.  Development and evaluation of a mini core collection for the U.S. peanut germplasm collection , 2005 .

[23]  Naoki Sugimoto,et al.  Temperature dependence of thermodynamic properties for DNA/DNA and RNA/DNA duplex formation. , 2002, European journal of biochemistry.

[24]  Morten Lillemo,et al.  Characterization of polyploid wheat genomic diversity using a high-density 90 000 single nucleotide polymorphism array , 2014, Plant biotechnology journal.

[25]  A. Mobasheri,et al.  Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. , 2013, Omics : a journal of integrative biology.

[26]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[27]  J. Leebens-Mack,et al.  Single Nucleotide Polymorphism–based Genetic Diversity in the Reference Set of Peanut (Arachis spp.) by Developing and Applying Cost‐Effective Kompetitive Allele Specific Polymerase Chain Reaction Genotyping Assays , 2013 .

[28]  J V Tu,et al.  Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. , 1996, Journal of clinical epidemiology.

[29]  John J. Grefenstette,et al.  Application of machine learning in SNP discovery , 2006, BMC Bioinformatics.

[30]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[31]  Amr R. A. Kataya,et al.  Identification of Novel Plant Peroxisomal Targeting Signals by a Combination of Machine Learning Methods and in Vivo Subcellular Targeting Analyses[W] , 2011, The Plant Cell.

[32]  J. SantaLucia,et al.  The thermodynamics of DNA structural motifs. , 2004, Annual review of biophysics and biomolecular structure.

[33]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[34]  Irina S. Moreira,et al.  A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces , 2016, International journal of molecular sciences.

[35]  He Zhang,et al.  Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution , 2015, Nature Biotechnology.

[36]  Y. Chu,et al.  A Developmental Transcriptome Map for Allotetraploid Arachis hypogaea , 2016, Front. Plant Sci..

[37]  Duane Szafron,et al.  Predicting homologous signaling pathways using machine learning , 2009, Bioinform..

[38]  P. Ozias‐Akins,et al.  RNA Sequencing of Contaminated Seeds Reveals the State of the Seed Permissive for Pre-Harvest Aflatoxin Contamination and Points to a Potential Susceptibility Factor , 2016, Toxins.