Comparing batch update with randomized update for identifying salient genes applied to cancer gene expression clustering

DNA microarrays usually screen a sufficiently large number of genes, including redundancies. In this paper, we study a neighbour-based method for gene assessment applied to the discovery of interesting clusters in an attempt to understand relations among cancer gene expression data. Using the gene assessment, an adaptive vector space is used for recording the genes’ saliences, where the element in this vector represents the weight of the corresponding gene. We thus compare a batch update strategy to a randomized update strategy to iteratively update vectors in the process of gene assessment. In tests on two benchmark cancer gene expression datasets, the experimental results indicate that our batch update strategy performs better than the randomized update strategy for gene assessment applied to the discovery of interesting clusters.

[1]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3]  Yimin Liu,et al.  Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data , 2014, Knowl. Based Syst..

[4]  Gonzalo Bailador,et al.  Analysis of pattern recognition and dimensionality reduction techniques for odor biometrics , 2013, Knowl. Based Syst..

[5]  Yen-Wei Chen,et al.  Batch-incremental principal component analysis with exact mean update , 2011, 2011 18th IEEE International Conference on Image Processing.

[6]  Jie Cao,et al.  A novel neural network approach to cDNA microarray image segmentation , 2013, Comput. Methods Programs Biomed..

[7]  Ivor W. Tsang,et al.  A Feature Selection Method for Multivariate Performance Measures , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Huan Liu,et al.  CoSelect: Feature Selection with Instance Selection for Social Media Data , 2013, SDM.

[9]  Chien-Hsing Chen,et al.  A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection , 2014, Appl. Soft Comput..

[10]  Joel Quintanilla-Domínguez,et al.  Breast cancer classification applying artificial metaplasticity algorithm , 2011, Neurocomputing.

[11]  Furong Gao,et al.  Statistical analysis and online monitoring for handling multiphase batch processes with varying durations , 2011 .

[12]  Chien-Hsing Chen FEATURE SELECTION BASED ON COMPACTNESS AND SEPARABILITY: COMPARISON WITH FILTER‐BASED METHODS , 2014, Comput. Intell..

[13]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[14]  Tanja Cufer,et al.  The 76-gene signature defines high-risk patients that benefit from adjuvant tamoxifen therapy , 2009, Breast Cancer Research and Treatment.

[15]  Junghui Chen,et al.  On-line batch process monitoring using dynamic PCA and dynamic PLS models , 2002 .

[16]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[17]  Chien-Hsing Chen,et al.  A semi-supervised feature selection method using a non-parametric technique with pairwise instance constraints , 2013, J. Inf. Sci..

[18]  S. P. Fodor DNA SEQUENCING: Massively Parallel Genomics , 1997, Science.

[19]  Fabrice Rossi,et al.  Batch kernel SOM and related Laplacian methods for social network analysis , 2008, Neurocomputing.

[20]  Jian Zhang,et al.  Double-bootstrapping source data selection for instance-based transfer learning , 2013, Pattern Recognit. Lett..