StrucBreak: A Computational Framework for Structural Break Detection in DNA Sequences

Damages or breaks in DNA may change the characteristics of genomes and causes various diseases. In this work we construct a system that incorporates the maximum likelihood-based probabilistic formula to assess the number of damages that have occurred in any DNA sequence. This approach has been progressively benchmarked by implementing simulated data sets so that the outcomes can be compared with a ground truth or reference value. At first the sequence data set order is checked through the statistical cumulative sum (STACUMSUM). The verified sequences are then estimated by prior and posterior probability to count the percentages of breaks and mutations. Maximum-likelihood estimation then finds out the exact numbers and positions of breaks and detections. In database manipulation, one factor that decides the orientation and order of the sequence is geometric distance between consecutive sequences. The geometric distance is measured for smooth representation of the genome or DNA sequences. Finally, we compared the performance of our system with DAMBE5: (A Comprehensive Software Package for Data Analysis in Molecular Biology and Evaluation), and in response to time and space complexity, StrucBreak is much faster and consumes much less space due to our algorithmic approaches.

[1]  Robert Lund,et al.  An MDL approach to the climate segmentation problem , 2010, 1010.1397.

[2]  S. Jackson,et al.  Competing roles of DNA end resection and non-homologous end joining functions in the repair of replication-born double-strand breaks by sister-chromatid recombination , 2012, Nucleic acids research.

[3]  Mohammad Ibrahim Khan,et al.  De Bruijn Graph based De novo Genome Assembly , 2014, J. Softw..

[4]  M. McVey,et al.  MMEJ repair of double-strand breaks (director's cut): deleted sequences and alternative endings. , 2008, Trends in genetics : TIG.

[5]  P. Huertas,et al.  DNA resection in eukaryotes: deciding how to fix the break , 2010, Nature Structural &Molecular Biology.

[6]  Michael W. Robbins,et al.  Changepoints in the North Atlantic Tropical Cyclone Record , 2011 .

[7]  R. Jena,et al.  Advances in radiotherapy , 2012, BMJ : British Medical Journal.

[8]  S. Elledge,et al.  The DNA damage response: making it safe to play with knives. , 2010, Molecular cell.

[9]  Facundo D. Batista,et al.  RIF1 Is Essential for 53BP1-Dependent Nonhomologous End Joining and Suppression of DNA Double-Strand Break Resection , 2013, Molecular cell.

[10]  Jaromír Antoch,et al.  Effect of dependence on statistics for determination of change , 1997 .

[11]  J. Bai,et al.  Least squares estimation of a shift in linear processes , 1994 .

[12]  Peer Bork,et al.  A superfamily of conserved domains in DNA damage‐ responsive cell cycle checkpoint proteins , 1997, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[13]  Kim D. Pruitt,et al.  RefSeq curation and annotation of antizyme and antizyme inhibitor genes in vertebrates , 2015, Nucleic acids research.

[14]  J Wade Harper,et al.  The DNA damage response: ten years after. , 2007, Molecular cell.

[15]  E. Mardis Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.

[16]  E. Friedberg,et al.  DNA Repair and Mutagenesis , 2006 .

[17]  M. Lieber,et al.  The Mechanism of Human Nonhomologous DNA End Joining* , 2008, Journal of Biological Chemistry.

[18]  Hilmar Lapp,et al.  NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata , 2012, Systematic biology.

[19]  Penny A. Jeggo,et al.  The role of double-strand break repair — insights from human genetics , 2006, Nature Reviews Genetics.

[20]  Zbigniew J. Czech,et al.  Introduction to Parallel Computing , 2017 .

[21]  S. Jackson,et al.  Interfaces Between the Detection, Signaling, and Repair of DNA Damage , 2002, Science.

[22]  Rodrigo Lopez,et al.  The EMBL-EBI bioinformatics web and programmatic tools framework , 2015, Nucleic Acids Res..

[23]  Richard F. Davis,et al.  Damage to DNA in Bacterioplankton: A Model of Damage by Ultraviolet Radiation and its Repair as Influenced by Vertical Mixing¶ , 2000 .

[24]  T. Lindahl,et al.  Repair of endogenous DNA damage. , 2000, Cold Spring Harbor symposia on quantitative biology.

[25]  Mohammad Ibrahim Khan,et al.  Performance evaluation of Warshall algorithm and dynamic programming for Markov chain in local sequence alignment , 2013, Interdisciplinary Sciences: Computational Life Sciences.

[26]  Waqar Haque,et al.  An efficient algorithm for local sequence alignment , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[27]  Mohammad Ibrahim Khan,et al.  An integrated algorithm for local sequence alignment , 2014, Network Modeling Analysis in Health Informatics and Bioinformatics.

[28]  X. Xia DAMBE5: A Comprehensive Software Package for Data Analysis in Molecular Biology and Evolution , 2013, Molecular biology and evolution.

[29]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[30]  S. Elledge,et al.  A chromatin localization screen reveals poly (ADP ribose)-regulated recruitment of the repressive polycomb and NuRD complexes to sites of DNA damage , 2010, Proceedings of the National Academy of Sciences.

[31]  Paul Tempst,et al.  WSTF regulates the H2A.X DNA damage response via a novel tyrosine kinase , 2009 .

[32]  L. Elo,et al.  ROTS: reproducible RNA-seq biomarker detector—prognostic markers for clear cell renal cell cancer , 2015, Nucleic acids research.

[33]  T. Pandita,et al.  The role of the DNA double-strand break response network in meiosis. , 2004, DNA repair.

[34]  Keith W. Caldecott,et al.  Single-strand break repair and genetic disease , 2008, Nature Reviews Genetics.

[35]  Jiri Bartek,et al.  Cell-cycle checkpoints and cancer , 2004, Nature.

[36]  J. Haber,et al.  Two alternative pathways of double-strand break repair that are kinetically separable and independently modulated , 1992, Molecular and cellular biology.

[37]  Sonia Farhana Nimmy,et al.  Next generation sequencing under de novo genome assembly , 2015 .

[38]  Robert Lund,et al.  Mean shift testing in correlated data , 2011 .

[39]  Md. Sarwar Kamal,et al.  Chapman–Kolmogorov equations for global PPIs with Discriminant-EM , 2014 .

[40]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[41]  J. Bartek,et al.  The DNA-damage response in human biology and disease , 2009, Nature.

[42]  James E Haber,et al.  Surviving the breakup: the DNA damage checkpoint. , 2006, Annual review of genetics.

[43]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[44]  N. Manfrini,et al.  DNA double-strand breaks in meiosis: checking their formation, processing and repair. , 2009, DNA repair.

[45]  J. Bartek,et al.  DNA damage checkpoints: from initiation to recovery or adaptation. , 2007, Current opinion in cell biology.

[46]  S. Gangloff,et al.  The RecQ DNA helicases in DNA repair. , 2010, Annual review of genetics.

[47]  K. Cimprich,et al.  ATR: an essential regulator of genome integrity , 2008, Nature Reviews Molecular Cell Biology.

[48]  M. Lieber,et al.  Mechanisms of chromosomal rearrangement in the human genome , 2010, BMC Genomics.

[49]  Sanguthevar Rajasekaran,et al.  Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs , 2010, BMC Bioinformatics.

[50]  P. Sung,et al.  Rad52 SUMOylation affects the efficiency of the DNA repair , 2012, Nucleic acids research.