A hidden Markov model-based algorithm for identifying tumour subtype using array CGH data

BackgroundThe recent advancement in array CGH (aCGH) research has significantly improved tumor identification using DNA copy number data. A number of unsupervised learning methods have been proposed for clustering aCGH samples. Two of the major challenges for developing aCGH sample clustering are the high spatial correlation between aCGH markers and the low computing efficiency. A mixture hidden Markov model based algorithm was developed to address these two challenges.ResultsThe hidden Markov model (HMM) was used to model the spatial correlation between aCGH markers. A fast clustering algorithm was implemented and real data analysis on glioma aCGH data has shown that it converges to the optimal cluster rapidly and the computation time is proportional to the sample size. Simulation results showed that this HMM based clustering (HMMC) method has a substantially lower error rate than NMF clustering. The HMMC results for glioma data were significantly associated with clinical outcomes.ConclusionsWe have developed a fast clustering algorithm to identify tumor subtypes based on DNA copy number aberrations. The performance of the proposed HMMC method has been evaluated using both simulated and real aCGH data. The software for HMMC in both R and C++ is available in ND INBRE website http://ndinbre.org/programs/bioinformatics.php

[1]  B Fisher,et al.  Tamoxifen and chemotherapy for lymph node-negative, estrogen receptor-positive breast cancer. , 1997, Journal of the National Cancer Institute.

[2]  L. Chin,et al.  High-resolution genomic profiles define distinct clinico-pathogenetic subgroups of multiple myeloma patients. , 2006, Cancer cell.

[3]  Yuhang Wang,et al.  Tumor classification based on DNA copy number aberrations determined using SNP arrays. , 2006, Oncology reports.

[4]  T. Shibata,et al.  Epidermal growth factor receptor gene mutations and increased copy numbers predict gefitinib sensitivity in patients with recurrent non-small-cell lung cancer. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[6]  J. Coon,et al.  An algorithm for classifying tumors based on genomic aberrations and selecting representative tumor models , 2010, BMC Medical Genomics.

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  S. Knuutila,et al.  Classification of human cancers based on DNA copy number amplification modeling , 2008, BMC Medical Genomics.

[9]  Terence P. Speed,et al.  A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6 , 2009, Bioinform..

[10]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[11]  S. Bülow,et al.  Hereditary non-polyposis colorectal cancer: clinical features and survival. Results from the Danish HNPCC register. , 1997, Scandinavian journal of gastroenterology.

[12]  M. Harris Monoclonal antibodies as therapeutic agents for cancer. , 2004, The Lancet. Oncology.

[13]  S. Knuutila,et al.  DNA copy number amplification profiling of human neoplasms , 2006, Oncogene.

[14]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[15]  Elisa Rossi,et al.  Epidermal growth factor receptor gene and protein and gefitinib sensitivity in non-small-cell lung cancer. , 2005, Journal of the National Cancer Institute.

[16]  Lyndsay N Harris,et al.  Efficacy and safety of trastuzumab as a single agent in first-line treatment of HER2-overexpressing metastatic breast cancer. , 2002, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[17]  A. Lièvre,et al.  Mutations and Response to Epidermal Growth Factor Receptor Inhibitors , 2009, Clinical Cancer Research.

[18]  Joe W. Gray,et al.  Genome scanning with array CGH delineates regional alterations in mouse islet carcinomas , 2001, Nature Genetics.

[19]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[20]  L. Chin,et al.  Marked genomic differences characterize primary and secondary glioblastoma subtypes and identify two distinct molecular and clinical secondary glioblastoma entities. , 2006, Cancer research.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Kevin P. Murphy,et al.  Model-based clustering of array CGH data , 2009, Bioinform..

[23]  J. Reis-Filho,et al.  Comparative Genomic Hybridisation Arrays: High-Throughput Tools to Determine Targeted Therapy in Breast Cancer , 2008, Pathobiology.

[24]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[25]  C. Palmieri,et al.  A 2009 update on the treatment of patients with hormone receptor-positive breast cancer. , 2009, Clinical breast cancer.

[26]  A. Gazdar Personalized medicine and inhibition of EGFR signaling in lung cancer. , 2009, The New England journal of medicine.