Efficient Clustering for Gene Expression Data

In the past decade there have been advance in technologies, the amount of biological data such as DNA sequences and microarray data have been increased tremendously. To obtain knowledge from the data, explore relationships between genes, understanding severe diseases and development of drugs for patterns from the databases of large size and high dimensionality. Information retrieval and data mining are powerful tools to extract information from the databases and/or information repositories. The integrative cluster analysis of both clinical and gene expression data has shown to be an effective alternative to overcome the abovementioned problems. In this paper, we focus on how to improve the searching and the clustering performance in genomic data from commonly used clustering techniques. In the proposed gene clustering technique, firstly, the high dimensionality of the microarray gene data is reduced using LPP. The LPP is chosen for the dimensionality reduction because of its ability of preserving locality of neighborhood relationship. Secondly, through performance experiments on real data sets, the proposed method fuzzy C-means is shown to achieve higher efficiency, clustering quality and automation than other clustering method. General Terms Data Mining, information retrieval, Bio-informatics et al.

[1]  R. M. Suresh,et al.  An Effective Classification Technique for Microarray Gene Expression by Blending of LPP and SVM , 2011 .

[2]  Sašo Džeroski,et al.  Constrained clustering of gene expression profiles , 2005 .

[3]  Sankar K. Pal,et al.  Data mining in soft computing framework: a survey , 2002, IEEE Trans. Neural Networks.

[4]  Zhoujun Li,et al.  Ontology Based Clustering for Improving Genomic IR , 2007, Twentieth IEEE International Symposium on Computer-Based Medical Systems (CBMS'07).

[5]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[6]  Nitesh V. Chawla,et al.  A supervised learning approach to the ensemble clustering of genes , 2014, Int. J. Data Min. Bioinform..

[7]  Man-chun Yuen,et al.  Genomic sequence search and clustering using Q-gram , 2007 .

[8]  A.K.C. Wong,et al.  Attribute clustering for grouping, selection, and classification of gene expression data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  David M. Rocke,et al.  Dimension Reduction for Classification with Gene Expression Microarray Data , 2006, Statistical applications in genetics and molecular biology.

[10]  D Napoleon,et al.  A New Method for Dimensionality Reduction using K- Means Clustering Algorithm for High Dimensional Data Set , 2011 .

[11]  Sung-Bae Cho,et al.  Multi-objective Classification Rule Mining Using Gene Expression Programming , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.

[12]  Yuk Yee Leung,et al.  An integrated approach to feature selection and classification for microarray data with outlier detection , 2009 .

[13]  Anjana Gosain,et al.  A density oriented fuzzy C-means clustering algorithm for recognising original cluster shapes from noisy data , 2011 .