A landmark extraction method for protein 2DE gel images based on multi-dimensional clustering

OBJECTIVE Two-dimensional electrophoresis (2DE) is a separation technique that can identify target proteins existing in a tissue. Its result is represented by a gel image that displays an individual protein in a tissue as a spot. However, because the technique suffers from low reproducibility, a user should manually annotate landmark spots on each gel image to analyze the spots of different images together. This operation is an error-prone and tedious job. For this reason, this paper proposes a method of extracting landmark spots automatically by using a data mining technique. METHOD AND MATERIAL A landmark profile which summarizes the characteristics of landmark spots in a set of training gel images of the same tissue is generated by extracting the common properties of the landmark spots. On the basis of the landmark profile, candidate landmark spots in a new gel image of the same tissue are identified, and final landmark spots are determined by the well-known A* search algorithm. RESULT AND CONCLUSIONS The performance of the proposed method is analyzed through a series of experiments in order to identify its various characteristics.

[1]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[2]  R D Appel,et al.  Melanie II – a third‐generation software package for analysis of two‐dimensional electrophoresis images: I. Features and user interface , 1997, Electrophoresis.

[3]  Dimitrios Gunopulos,et al.  Iterative Incremental Clustering of Time Series , 2004, EDBT.

[4]  A. Görg,et al.  The current state of two‐dimensional electrophoresis with immobilized pH gradients , 2000, Electrophoresis.

[5]  Jun Kawai,et al.  Restriction landmark genomic scanning method and its various applications , 1993, Electrophoresis.

[6]  M. Mann,et al.  Analysis of proteins and proteomes by mass spectrometry. , 2001, Annual review of biochemistry.

[7]  T. Rabilloud Two‐dimensional gel electrophoresis in proteomics: Old, old fashioned, but it still climbs up the mountains , 2002, Proteomics.

[8]  Dan A. Simovici,et al.  Metric incremental clustering of nominal data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  Watanabe,et al.  DNAinsight: An Image Processing System for 2-D Gel Electrophoreisis of Genomic DNA. , 1997, Genome informatics. Workshop on Genome Informatics.

[10]  Watanabe,et al.  Fully-Automated Spot Recognition and Matching Algorithms for 2-D Gel Electrophoretogram of Genomic DNA. , 1998, Genome informatics. Workshop on Genome Informatics.

[11]  M J Dunn,et al.  Positional reproducibility of protein spots in two‐dimensional polyacrylamide gel electrophoresis using immobilised pH gradient isoelectric focusing in the first dimension: An interlaboratory comparison , 1994, Electrophoresis.

[12]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[13]  Konagaya,et al.  Automated Processing of 2-D Gel Electrophoretograms of Genomic DNA for Hunting Pathogenic DNA Molecular Changes. , 1999, Genome informatics. Workshop on Genome Informatics.

[14]  S Veeser,et al.  Multiresolution image registration for two‐dimensional gel electrophoresis , 2001, Proteomics.

[15]  Nils J. Nilsson,et al.  Artificial Intelligence: A New Synthesis , 1997 .

[16]  Jung Eun Shim,et al.  An integrated proteome database for two‐dimensional electrophoresis data analysis and laboratory information management system , 2002, Proteomics.

[17]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[18]  Alon Efrat,et al.  Geometric algorithms for the analysis of 2D-electrophoresis gels , 2001, RECOMB.

[19]  Alon Efrat,et al.  Geometric Algorithms for the Analysis of 2D-Electrophoresis Gels , 2002, J. Comput. Biol..

[20]  Carola Wenk,et al.  Matching 2D patterns of protein spots , 1998, SCG '98.

[21]  Wilburt Labio,et al.  Physical database design for data warehouses , 1997, Proceedings 13th International Conference on Data Engineering.

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  S. Gygi,et al.  Quantitative analysis of complex protein mixtures using isotope-coded affinity tags , 1999, Nature Biotechnology.

[24]  R D Appel,et al.  Melanie II – a third‐generation software package for analysis of two‐dimensional electrophoresis images: II. Algorithms , 1997, Electrophoresis.

[25]  Andrew Emili,et al.  De novo peptide sequencing and quantitative profiling of complex protein mixtures using mass-coded abundance tagging , 2002, Nature Biotechnology.