论文信息 - On an ensemble algorithm for clustering cancer patient data

On an ensemble algorithm for clustering cancer patient data

BackgroundThe TNM staging system is based on three anatomic prognostic factors: Tumor, Lymph Node and Metastasis. However, cancer is no longer considered an anatomic disease. Therefore, the TNM should be expanded to accommodate new prognostic factors in order to increase the accuracy of estimating cancer patient outcome. The ensemble algorithm for clustering cancer data (EACCD) by Chen et al. reflects an effort to expand the TNM without changing its basic definitions. Though results on using EACCD have been reported, there has been no study on the analysis of the algorithm. In this report, we examine various aspects of EACCD using a large breast cancer patient dataset. We compared the output of EACCD with the corresponding survival curves, investigated the effect of different settings in EACCD, and compared EACCD with alternative clustering approaches.ResultsUsing the basic T and N definitions, EACCD generated a dendrogram that shows a graphic relationship among the survival curves of the breast cancer patients. The dendrograms from EACCD are robust for large values of m (the number of runs in the learning step). When m is large, the dendrograms depend on the linkage functions.The statistical tests, however, employed in the learning step have minimal effect on the dendrogram for large m. In addition, if omitting the step for learning dissimilarity in EACCD, the resulting approaches can have a degraded performance. Furthermore, clustering only based on prognostic factors could generate misleading dendrograms, and direct use of partitioning techniques could lead to misleading assignments to clusters.ConclusionsWhen only the Partitioning Around Medoids (PAM) algorithm is involved in the step of learning dissimilarity, large values of m are required to obtain robust dendrograms, and for a large m EACCD can effectively cluster cancer patient data.

[1] J. Klein,et al. Survival Analysis: Techniques for Censored and Truncated Data , 1997 .

[2] I. Langner. Survival Analysis: Techniques for Censored and Truncated Data , 2006 .

[3] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[4] H. Burke,et al. Outcome prediction and the future of the TNM staging system. , 2004, Journal of the National Cancer Institute.

[5] Xiuzhen Cheng,et al. Developing Prognostic Systems of Cancer Patients by Ensemble Clustering , 2009, Journal of biomedicine & biotechnology.

[6] Scott H. Kurtzman,et al. AJCC cancer staging atlas , 2006 .

[7] Ashutosh Kumar Singh,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[8] Peter J. Rousseeuw,et al. Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[9] H. Burke,et al. Criteria for prognostic factors and for an enhanced prognostic system , 1993, Cancer.

[10] F. Harrell,et al. Artificial neural networks improve the accuracy of cancer survival prediction , 1997, Cancer.

[11] Dengyuan Wu,et al. Analysis of an ensemble algorithm for clustering cancer data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[12] Mitch Dowsett,et al. Beyond Anatomic Staging: Are We Ready to Take the Leap to Molecular Classification? , 2005 .