Surveying and benchmarking techniques to analyse DNA gel fingerprint images

DNA fingerprinting is a genetic typing technique that allows the analysis of the genomic relatedness between samples, and the comparison of DNA patterns. The analysis of DNA gel fingerprint images usually consists of five consecutive steps: image pre-processing, lane segmentation, band detection, normalization and fingerprint comparison. In this article, we firstly survey the main methods that have been applied in the literature in each of these stages. Secondly, we focus on lane-segmentation and band-detection algorithms-as they are the steps that usually require user-intervention-and detect the seven core algorithms used for both tasks. Subsequently, we present a benchmark that includes a data set of images, the gold standards associated with those images and the tools to measure the performance of lane-segmentation and band-detection algorithms. Finally, we implement the core algorithms used both for lane segmentation and band detection, and evaluate their performance using our benchmark. As a conclusion of that study, we obtain that the average profile algorithm is the best starting point for lane segmentation and band detection.

[1]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[2]  Soo-Hyung Kim,et al.  Lane detection and tracking in PCR gel electrophoresis images , 2012 .

[3]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[4]  Xiaolong Wang,et al.  iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach , 2016, Journal of biomolecular structure & dynamics.

[5]  James M. Keller,et al.  Pulsed-Field Gel Elec rophoresis Pa ern Recogni ion of Bac erial DNA: A Systemic Approach , 2001, Pattern Analysis & Applications.

[6]  César Domínguez,et al.  A survey of tools for analysing DNA fingerprints , 2016, Briefings Bioinform..

[7]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[8]  Anastasios Delopoulos,et al.  Efficient Quantitative Information Extraction from PCR-RFLP Gel Electrophoresis Images , 2010, 2010 20th International Conference on Pattern Recognition.

[9]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[10]  I. Ismail,et al.  Bands detection and Lanes segmentation in DNA Fingerprint images , 2014 .

[11]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[12]  Wei Chen,et al.  PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions , 2015, Bioinform..

[13]  Shu-Wei Guo,et al.  Automatic band detection of 1D-gel images , 2011, 2011 International Conference on Electronics, Communications and Control (ICECC).

[14]  I. Holländer,et al.  Improvement of Electrophoretic Gel Image Analysis , 2001 .

[15]  Tomasz Waller,et al.  Familial or Sporadic Idiopathic Scoliosis – classification based on artificial neural network and GAPDH and ACTB transcription profile , 2013, BioMedical Engineering OnLine.

[16]  Hamid Hassanpour,et al.  Automatic Lane Extraction in Hemoglobin and Serum Protein Electrophoresis Using Image Processing , 2012 .

[17]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[18]  Naima Kaabouch,et al.  Automatic segmentation and band detection of protein images based on the standard deviation profile and its derivative , 2007, 2007 IEEE International Conference on Electro/Information Technology.

[19]  Chih-Yang Lin,et al.  Comparing lanes in the pulsed-field gel electrophoresis (PFGE) images , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[20]  Xiaolong Wang,et al.  repRNA: a web server for generating various feature vectors of RNA sequences , 2015, Molecular Genetics and Genomics.

[21]  Soonmin Jang,et al.  Investigation on critical structural motifs of ligands for triggering glucocorticoid receptor nuclear migration through molecular docking simulations , 2016, Journal of biomolecular structure & dynamics.

[22]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[23]  Carlos Fernandez-Lozano,et al.  Texture classification using feature selection and kernel-based techniques , 2015, Soft Computing.

[24]  Hyeonjoon Moon,et al.  The FERET Evaluation Methodology for Face-Recognition Algorithms , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Ivan Bajla,et al.  A novel approach to quantitative analysis of electrophoretic gel images of DNA fragments , 2002, Proceedings IEEE International Symposium on Biomedical Imaging.

[26]  Rui L. Aguiar,et al.  Automatic Lane and Band Detection in Images of Thin Layer Chromatography , 2004, ICIAR.

[27]  Miguel Angel Sotaquira Gutierrez On the Use of Distance Maps in the Analysis of 1D DNA Gel Images , 2009 .

[28]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[29]  Wei Chen,et al.  iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[30]  Soo-Hyung Kim,et al.  Lanes Detection in PCR Gel Electrophoresis Images , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.

[31]  Xiaolong Wang,et al.  repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects , 2015, Bioinform..

[32]  B. S. Manjunath,et al.  A biosegmentation benchmark for evaluation of bioimage analysis methods , 2009, BMC Bioinformatics.

[33]  Stanley R. Sternberg,et al.  Biomedical Image Processing , 1983, Computer.

[34]  Ana Brândusa Pavel,et al.  PyElph - a software tool for gel images analysis and phylogenetics , 2012, BMC Bioinformatics.

[35]  Zygmunt Wróbel,et al.  Automatic analysis of 2D polyacrylamide gels in the diagnosis of DNA polymorphisms , 2013, Biomedical engineering online.

[36]  Jiann-Der Lee,et al.  Automatic DNA sequencing for electrophoresis gels using image processing algorithms , 2011 .

[37]  F. Bekaert,et al.  Intra- and intergeneric relationships of the genus Oceanospirillum , 1989 .

[38]  Paul Vauterin,et al.  Integrated Databasing and Analysis , 2006 .

[39]  A.M. Siqueira,et al.  An iterative algorithm for segmenting lanes in gel electrophoresis images , 1997, Proceedings X Brazilian Symposium on Computer Graphics and Image Processing.

[40]  K. Chou,et al.  iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[41]  A. Ochiai Zoogeographical Studies on the Soleoid Fishes Found in Japan and its Neighbouring Regions-III , 1957 .

[42]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[43]  Martin Vitek,et al.  Preprocessing and Classification of Electrophoresis Gel Images Using Dynamic Time Warping , 2013, International Journal of Electrochemical Science.

[44]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[45]  Zakaria Hussain,et al.  Gel electrophoresis image segmentation with Otsu method based on Particle Swarm Optimization , 2011, 2011 IEEE 7th International Colloquium on Signal Processing and its Applications.

[46]  A. Agarwal,et al.  Identification of lanes and bands in DNA autoradiogram images , 1995, Proceedings of the First Regional Conference, IEEE Engineering in Medicine and Biology Society and 14th Conference of the Biomedical Engineering Society of India. An International Meet.

[47]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[48]  K. Chou,et al.  iRSpot-TNCPseAAC: Identify Recombination Spots with Trinucleotide Composition and Pseudo Amino Acid Components , 2014, International journal of molecular sciences.

[49]  B. Liu,et al.  Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. , 2015, Journal of theoretical biology.

[50]  Ivan Bajla,et al.  An alternative method for electrophoretic gel image analysis in the GelMaster software , 2005, Comput. Methods Programs Biomed..

[51]  Ana Maria Mendonça,et al.  Automatic Lane Segmentation in TLC Images Using the Continuous Wavelet Transform , 2013, Comput. Math. Methods Medicine.

[52]  Naima Kaabouch,et al.  An improved 1-D gel electrophoresis image analysis system. , 2010, Advances in experimental medicine and biology.

[53]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[54]  Fritz Albregtsen,et al.  Automatic lane detection and separation in one dimensional gel images using continuous wavelet transform , 2010 .

[55]  Francisco Herrera,et al.  A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability , 2009, Soft Comput..

[56]  Ana Maria Mendonça,et al.  Automatic Lane Detection in Chromatography Images , 2012, ICIAR.

[57]  Lorne T. Kirby,et al.  DNA fingerprinting. , 1991, Electrophoresis.

[58]  Zakaria Hussain,et al.  Gel electrophoresis image segmentation with Otsu method based on Particle Swarm Optimization , 2011, CSPA 2011.

[59]  D. Soumpasis,et al.  Effects of DNA sequence and conformation on nucleosome formation. , 1985, Journal of biomolecular structure & dynamics.

[60]  Teresa Mendonça,et al.  Automatic Information Extraction from Gel Electrophoresis Images Using GEIAS , 2010, ICIAR.

[61]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[62]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[63]  Ivan Bajla Novel algorithms implemented in the gel image analysis system GAS2 , 2003 .

[64]  A. M. Pessoa,et al.  An automatic Method to identify and extract information of DNA bands in Gel Electrophoresis images , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[65]  S. Henikoff,et al.  Automated band mapping in electrophoretic gel images using background information , 2005, Nucleic acids research.

[66]  Kevin W Eliceiri,et al.  NIH Image to ImageJ: 25 years of image analysis , 2012, Nature Methods.

[67]  B. Liu,et al.  iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition , 2014, PloS one.

[68]  César Domínguez,et al.  GelJ – a tool for analyzing DNA fingerprint gel images , 2015, BMC Bioinformatics.

[69]  Helge J. Ritter,et al.  Human vs. machine: evaluation of fluorescence micrographs , 2003, Comput. Biol. Medicine.

[70]  Ching Y. Suen,et al.  A recent development in image analysis of electrophoresis gels , 1999 .

[71]  Akbar Sheikh Akbari,et al.  AUTOMATIC LANE DETECTION AND SEPARATION IN ONE DIMENSIONAL DNA GEL IMAGES , 2004 .

[72]  F. Albregtsen,et al.  Automatic segmentation of DNA bands in one dimensional gel images produced by hybridizing techniques , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[73]  Donald G. Bailey,et al.  2 . 2 . 1 Processing of DNA and Protein Electrophoresis Gels by Image Analysis , 2005 .

[74]  K. Chou,et al.  iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. , 2015, Analytical biochemistry.

[75]  Inanç Birol,et al.  LaneRuler: Automated Lane Tracking for DNA Electrophoresis Gel Images , 2010, IEEE Transactions on Automation Science and Engineering.

[76]  A. Fullaondo,et al.  Quantitative analysis of two-dimensional gel electrophoresis protein patterns: a method for studying genetic relationships among Globodera pallida populations , 2001, Heredity.