Data Analysis of Multiplex Sequencing at SOLiD Platform: A Probabilistic Approach to Characterization and Reliability Increase

New sequencing technologies such as Illumina/Solexa, SOLiD/ABI, and 454/Roche, revolutionized the biological researches. In this context, the SOLiD platform has a particular sequencing type, known as multiplex run, which enables the sequencing of several samples in a single run. It implies in cost reduction and simplifies the analysis of related samples. Meanwhile, this sequencing type requires an additional filtering step to ensure the reliability of the results. Thus, we propose in this paper a probabilistic model which considers the intrinsic characteristics of each sequencing to characterize multiplex runs and filter low-quality data, increasing the data analysis reliability of multiplex sequencing performed on SOLiD. The results show that the proposed model proves to be satisfactory due to: 1) identification of faults in the sequencing process; 2) adaptation and development of new protocols for sample preparation; 3) the assignment of a degree of confidence to the data generated; and 4) guiding a filtering process, without discarding useful sequences in an arbitrary manner.

[1]  Chi Zhang,et al.  Cloud Computing for Next‐Generation Sequencing Data Analysis , 2017 .

[2]  Won Kim,et al.  Next-generation sequencing data analysis on cloud computing , 2015, Genes & Genomics.

[3]  M. Ronaghi,et al.  A pyrosequencing-tailored nucleotide barcode design unveils opportunities for large-scale sample multiplexing , 2007, Nucleic acids research.

[4]  Radhe Shyam Thakur,et al.  Now and Next-Generation Sequencing Techniques: Future of Sequence Analysis Using Cloud Computing , 2012, Front. Gene..

[5]  M. Ritchie,et al.  Preparing for Winter: The Transcriptomic Response Associated with Different Day Lengths in Drosophila montana , 2016, G3: Genes, Genomes, Genetics.

[6]  Vinod Gopalan,et al.  Review of sequencing platforms and their applications in phaeochromocytoma and paragangliomas. , 2017, Critical reviews in oncology/hematology.

[7]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[8]  H. Stunnenberg,et al.  p63 exerts spatio-temporal control of palatal epithelial cell fate to prevent cleft palate , 2017, PLoS genetics.

[9]  Todd P. Michael,et al.  Filtering error from SOLiD Output , 2010, Bioinform..

[10]  E. Mardis Next-generation sequencing platforms. , 2013, Annual review of analytical chemistry.

[11]  Lucian Ilie,et al.  SHRiMP2: Sensitive yet Practical Short Read Mapping , 2011, Bioinform..

[12]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[13]  J. Gong,et al.  Novel applications of next-generation sequencing in breast cancer research , 2017, Genes & diseases.

[14]  S. Russel and P. Norvig,et al.  “Artificial Intelligence – A Modern Approach”, Second Edition, Pearson Education, 2003. , 2015 .

[15]  R. O’Neill,et al.  Abundant Human DNA Contamination Identified in Non-Primate Genome Databases , 2011, PloS one.

[16]  Alexander F. Auch,et al.  Metagenomics to Paleogenomics: Large-Scale Sequencing of Mammoth DNA , 2006, Science.

[17]  R. Lal,et al.  High Throughput Sequencing: An Overview of Sequencing Chemistry , 2016, Indian Journal of Microbiology.

[18]  Jesfis Peral,et al.  Heuristics -- intelligent search strategies for computer problem solving , 1984 .

[19]  Matthew C. Fisher,et al.  Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects , 2013, Scientific Reports.

[20]  J. Dopazo Genomics and transcriptomics in drug discovery. , 2014, Drug discovery today.