Discovery-Driven Exploration Method in Lung Cancer 2-DE Gel Images Using the Data Cube

ABSTRACT In proteomics research, the identification of differentially expressed proteins observed under specific conditions is one of key issues. There are several ways to detect the change of a specific protein's expression level such as statistical analysis and graphical visualization. However, it is quiet difficult to handle the spot information of an individual protein manually by these methods, because there are a considerable number of proteins in a tissue sample. In this paper, using database and data mining techniques, the application plan of OLAP data cube and Discovery-driven exploration is proposed. By using data cubes, it is possible to analyze the relationship between proteins and relevant clinical information as well as analyzing the differentially expressed proteins by disease. We propose the measure and exception indicators which are suitable to analyzing protein expression level changes are proposed. In addition, we proposed the reducing method of calculating InExp in Discovery-driven exploration. We also evaluate the utility and effectiveness of the data cube and Discovery-driven exploration in the lung cancer 2-DE gel image.Keywords:Proteome Informatics, Data Mining, On-Line Analytical Processing, Two-Dimensional Electrophoresis

[1]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.

[2]  Morten Østergaard,et al.  The human keratinocyte two‐dimensional gel protein database (update 1995): Mapping components of signal transduction pathways , 1995, Electrophoresis.

[3]  M Vingron,et al.  Identification and Classification of Differentially Expressed Genes in Renal Cell Carcinoma by Expression Profiling on a Global Human 31 , 500-Element cDNA Array , 2001 .

[4]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[5]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[6]  Jung Eun Shim,et al.  An integrated proteome database for two‐dimensional electrophoresis data analysis and laboratory information management system , 2002, Proteomics.

[7]  Sung Gyoo Park,et al.  Proteome analysis of hepatocellular carcinoma. , 2002, Biochemical and biophysical research communications.

[8]  D. Arnott,et al.  An integrated approach to proteome analysis: identification of proteins associated with cardiac hypertrophy. , 1998, Analytical biochemistry.

[9]  S. Gygi,et al.  Quantitative analysis of complex protein mixtures using isotope-coded affinity tags , 1999, Nature Biotechnology.

[10]  P. Pochet A Quantitative Analysis , 2006 .

[11]  Andrew Emili,et al.  De novo peptide sequencing and quantitative profiling of complex protein mixtures using mass-coded abundance tagging , 2002, Nature Biotechnology.

[12]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[13]  R. Kuick,et al.  A database of protein expression in lung cancer , 2001, Proteomics.

[14]  T. Rabilloud Two‐dimensional gel electrophoresis in proteomics: Old, old fashioned, but it still climbs up the mountains , 2002, Proteomics.