Parallel Affinity Propagation Clustering in Identifying Sub-Network Biomarker Genes of Lung Cancer

Lung cancer is a complex disease. The identification of a disease biomarker is still challenging. Affinity Propagation (AP) is a clustering algorithm to cluster a set of data by identifying similar data points in an iterative process. By applying a microarray dataset, this leads to a scalability issue for large data points. In this work, Pearson's correlation was used for calculating a similarity matrix with subsequent pruning for further constructing a gene co-expression network. The AP has been applied to identify sub-network biomarkers of four lung cancer expression datasets based on two different microarray platforms. Parallel computing was applied to tackle high dimensionality and to reduce the time consumption of measuring similarity by Pearson's correlation and similarity matrix construction.

[1]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[2]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[3]  Xing Qiu,et al.  A new gene selection procedure based on the covariance distance , 2010, Bioinform..

[4]  S. Wacholder,et al.  Gene Expression Signature of Cigarette Smoking and Its Role in Lung Adenocarcinoma Development and Survival , 2008, PloS one.

[5]  Chi-Ying F. Huang,et al.  Selection of DDX5 as a novel internal control for Q-RT-PCR from microarray data using a block bootstrap re-sampling scheme , 2007, BMC Genomics.

[6]  Yike Guo,et al.  Parallel Clustering Algorithm for Large-Scale Biological Data Sets , 2014, PloS one.

[7]  Christina Freytag,et al.  Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .

[8]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[9]  Russ B. Altman,et al.  Bioinformatics challenges for personalized medicine , 2011, Bioinform..

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Christian Ottmann,et al.  Protein-Protein Interactions. , 2017, Drug discovery today. Technologies.

[12]  D. F. Waugh,et al.  Protein-protein interactions. , 1954, Advances in protein chemistry.

[13]  Patrik D'haeseleer,et al.  How does gene expression clustering work? , 2005, Nature Biotechnology.

[14]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[15]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[16]  L. Langman,et al.  The challenges of personalized medicine. , 2012, Clinical biochemistry.

[17]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[18]  Rafael Rosell,et al.  Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer , 2011, International journal of cancer.

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[21]  John Quackenbush Microarray analysis and tumor classification. , 2006, The New England journal of medicine.

[22]  Adam M. Gustafson,et al.  Airway PI3K Pathway Activation Is an Early and Reversible Event in Lung Cancer Development , 2010, Science Translational Medicine.

[23]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.