Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems

BackgroundThe Signal-to-Noise-Ratio (SNR) is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introduction of two indices, Gene Dominant Index and Gene Dormant Index (GDIs). These two indices lead to the concepts of dominant and dormant genes with biological significance. We use these indices to develop methodologies for discovery of dominant and dormant biomarkers with interesting biological significance. The dominancy and dormancy of the identified biomarkers and their excellent discriminating power are also demonstrated pictorially using the scatterplot of individual gene and 2-D Sammon's projection of the selected set of genes. Using information from the literature we have shown that the GDI based method can identify dominant and dormant genes that play significant roles in cancer biology. These biomarkers are also used to design diagnostic prediction systems.Results and discussionTo evaluate the effectiveness of the GDIs, we have used four multiclass cancer data sets (Small Round Blue Cell Tumors, Leukemia, Central Nervous System Tumors, and Lung Cancer). For each data set we demonstrate that the new indices can find biologically meaningful genes that can act as biomarkers. We then use six machine learning tools, Nearest Neighbor Classifier (NNC), Nearest Mean Classifier (NMC), Support Vector Machine (SVM) classifier with linear kernel, and SVM classifier with Gaussian kernel, where both SVMs are used in conjunction with one-vs-all (OVA) and one-vs-one (OVO) strategies. We found GDIs to be very effective in identifying biomarkers with strong class specific signatures. With all six tools and for all data sets we could achieve better or comparable prediction accuracies usually with fewer marker genes than results reported in the literature using the same computational protocols. The dominant genes are usually easy to find while good dormant genes may not always be available as dormant genes require stronger constraints to be satisfied; but when they are available, they can be used for authentication of diagnosis.ConclusionSince GDI based schemes can find a small set of dominant/dormant biomarkers that is adequate to design diagnostic prediction systems, it opens up the possibility of using real-time qPCR assays or antibody based methods such as ELISA for an easy and low cost diagnosis of diseases. The dominant and dormant genes found by GDIs can be used in different ways to design more reliable diagnostic prediction systems.

[1]  Bhabatosh Chanda,et al.  Accelerated codebook searching in a SOM-based Vector Quantizer , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[2]  Nikhil R. Pal,et al.  A Novel Connectionist Framework for Computation of an Approximate Convex-hull of a Set of Planar Points, Circles and Ellipses , 2006, Int. J. Neural Syst..

[3]  James C. Bezdek,et al.  Improving convergence and performance of Kohonen's self-organizing scheme , 1992, Defense, Security, and Sensing.

[4]  N.R. Pal,et al.  Selection of structure preserving features with neural networks , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[5]  James C. Bezdek,et al.  Cluster validation with generalized Dunn's indices , 1995, Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems.

[6]  Jeffrey S. Morris,et al.  Pooling Information Across Different Studies and Oligonucleotide Chip Types to Identify Prognostic Genes for Lung Cancer , 2005 .

[7]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[8]  Rajani K. Mudi,et al.  A new scheme for fuzzy rule-based system identification and its application to self-tuning fuzzy controllers , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[9]  B. Saha,et al.  Bidirectional Fuzzy-Regression Model for Road-lines Detection , 2006, 2006 IEEE International Conference on Engineering of Intelligent Systems.

[10]  William Stafford Noble,et al.  Analysis of strain and regional variation in gene expression in mouse brain , 2001, Genome Biology.

[11]  Isabelle Camby,et al.  High level of galectin-1 expression is a negative prognostic predictor of recurrence in laryngeal squamous cell carcinomas. , 2007, International journal of oncology.

[12]  G. Palenzuela,et al.  Malignant B cell non-Hodgkin's lymphoma of the larynx in children with Wiskott Aldrich syndrome. , 2003, International journal of pediatric otorhinolaryngology.

[13]  George Reid,et al.  Analysis of the CAVEOLIN-1 gene at human chromosome 7q31.1 in primary tumours and tumour-derived cell lines , 1999, Oncogene.

[14]  Sung-Bae Cho,et al.  Ensemble classifiers based on correlation analysis for DNA microarray classification , 2006, Neurocomputing.

[15]  G. Fiucci,et al.  Caveolin-1 inhibits anchorage-independent growth, anoikis and invasiveness in MCF-7 human breast cancer cells , 2002, Oncogene.

[16]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[17]  Nikhil R. Pal,et al.  Computation of consensus hydrophobicity scales with self-organizing maps and fuzzy clustering along with applications to protein fold prediction , 2007, Neural Parallel Sci. Comput..

[18]  James C. Bezdek,et al.  Extensions of self-organizing feature maps for improved visual displays , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[19]  Nikhil R. Pal,et al.  A connectionist model for graytone thinning , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[20]  K. Huebner,et al.  Candidate tumor suppressor genes at FRA7G are coamplified with MET and do not suppress malignancy in a gastric cancer. , 2003, Genomics.

[21]  James C. Bezdek,et al.  Several new classes of measures of fuzziness , 1993, [Proceedings 1993] Second IEEE International Conference on Fuzzy Systems.

[22]  Nikhil R. Pal,et al.  A neuro-fuzzy scheme for simultaneous feature selection and fuzzy rule-based classification , 2004, IEEE Transactions on Neural Networks.

[23]  E. Tajara,et al.  Annexin 1: Differential expression in tumor and mast cells in human larynx cancer , 2007, International journal of cancer.

[24]  Misao Ohki,et al.  Two distinct gene expression signatures in pediatric acute lymphoblastic leukemia with MLL rearrangements. , 2003, Cancer research.

[25]  Lawrence D True,et al.  Differential expression of CD10 in prostate cancer and its clinical implication , 2007, BMC urology.

[26]  Charles C. Wykoff,et al.  Recombinant Expression of Caveolin-1 in Oncogenically Transformed Cells Abrogates Anchorage-independent Growth* , 1997, The Journal of Biological Chemistry.

[27]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[28]  James C. Bezdek,et al.  A mixed c-means clustering model , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[29]  James C. Bezdek,et al.  Blind detection of targets from LADAR data , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[30]  S. Pal,et al.  Object-Backgpound Classification Using A New Definition Of Entropy , 1988, Proceedings of the 1988 IEEE International Conference on Systems, Man, and Cybernetics.

[31]  Charles A Powell,et al.  Non-small-cell lung cancer molecular signatures recapitulate lung developmental pathways. , 2003, The American journal of pathology.

[32]  Yukihiko Sato,et al.  Expression of CD10/neutral endopeptidase in normal and malignant tissues of the human stomach and colon , 1996, Journal of Gastroenterology.

[33]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[34]  M. de Silva,et al.  A novel human insulinoma-associated cDNA, IA-1, encodes a protein with "zinc-finger" DNA-binding motifs. , 1992, The Journal of biological chemistry.

[35]  Patrick O. Brown,et al.  Gene Expression Patterns in Pancreatic Tumors, Cells and Tissues , 2007, PloS one.

[36]  James C. Bezdek,et al.  Fuzzy Kohonen clustering networks , 1994, Pattern Recognit..

[37]  Tzyy-Ping Jung,et al.  An EEG-based subject- and session-independent drowsiness detection , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[38]  J. Radich,et al.  Elevated expression of the AF1q gene, an MLL fusion partner, is an independent adverse prognostic factor in pediatric acute myeloid leukemia. , 2004, Blood.

[39]  Satoru Kuhara,et al.  Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE , 2006, BMC Bioinformatics.

[40]  Nikhil R. Pal,et al.  Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering , 2007, BMC Bioinformatics.

[41]  R. Shapiro,et al.  Primary immunodeficiencies: genetic risk factors for lymphoma. , 1992, Cancer research.

[42]  S. Djuricic,et al.  Burkitt lymphoma-induced ileocolic intussusception in Wiskott-Aldrich syndrome. , 2006, Journal of pediatric hematology/oncology.

[43]  M Dietel,et al.  Caveolin-1 is down-regulated in human ovarian carcinoma and acts as a candidate tumor suppressor gene. , 2001, The American journal of pathology.

[44]  Rajani K. Mudi,et al.  Computational intelligence for decision‐making systems , 2003, Int. J. Intell. Syst..

[45]  Fang Liu,et al.  Overexpression of annexin 1 in pancreatic cancer and its clinical significance. , 2004, World journal of gastroenterology.

[46]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[47]  Z. Qian,et al.  Cytoplasmic expression of fibroblast growth factor receptor-4 in human pituitary adenomas: relation to tumor type, size, proliferation, and invasiveness. , 2004, The Journal of clinical endocrinology and metabolism.

[48]  Ruoping Tang,et al.  MRP3, BCRP, and P-Glycoprotein Activities are Prognostic Factors in Adult Acute Myeloid Leukemia , 2005, Clinical Cancer Research.

[49]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[50]  Krishna Chintalapudi,et al.  A novel scheme to determine the architecture of a multilayer perceptron , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[51]  R. P. Becker,et al.  Detection of neutral endopeptidase 24.11 (neprilysin) in human hepatocellular carcinomas by immunocytochemistry. , 1997, Anticancer research.

[52]  Nikhil R. Pal,et al.  Similarity-based approximate reasoning: methodology and application , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[53]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[54]  Nikhil R. Pal,et al.  Fuzzy logic approaches to structure preserving dimensionality reduction , 2002, IEEE Trans. Fuzzy Syst..

[55]  Jiong Wu,et al.  Gene expression profile analysis of an isogenic tumour metastasis model reveals a functional role for oncogene AF1Q in breast cancer metastasis. , 2006, European journal of cancer.

[56]  Bhabatosh Chanda,et al.  Design of vector quantizer for image compression using self-organizing feature map and surface fitting , 2004, IEEE Transactions on Image Processing.

[57]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[58]  Nikhil R. Pal,et al.  A novel training scheme for multilayered perceptrons to realize proper generalization and incremental learning , 2003, IEEE Trans. Neural Networks.

[59]  S. P. Banks,et al.  On the connectionist implementation of the Hough transform , 1992 .

[60]  Sei-Hyun Ahn,et al.  Differential expression of annexin I in human mammary ductal epithelial cells in normal and benign and malignant breast tissues , 1997, Clinical & Experimental Metastasis.

[61]  J. Couet,et al.  Caveolin-1 is down-regulated in human lung carcinoma and acts as a candidate tumor suppressor gene. , 2004, Chest.

[62]  M. Ittmann,et al.  The Fibroblast Growth Factor Receptor-4 Arg388 Allele Is Associated with Prostate Cancer Initiation and Progression , 2004, Clinical Cancer Research.

[63]  N. Pal,et al.  Measures of discrimination and ambiguity for fuzzy sets , 1992, [1992 Proceedings] IEEE International Conference on Fuzzy Systems.

[64]  Chuen-Der Huang,et al.  Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification , 2003, IEEE Transactions on NanoBioscience.

[65]  Nikhil R. Pal,et al.  Artificial neural network approach for estimating weld bead width and depth of penetration from infrared thermal image of weld pool , 2008 .

[66]  James M. Keller,et al.  A new hybrid c-means clustering model , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[67]  N. Ordóñez,et al.  The diagnostic utility of immunohistochemistry in distinguishing between epithelioid mesotheliomas and squamous carcinomas of the lung: a comparative study , 2006, Modern Pathology.

[68]  James C. Bezdek,et al.  Fuzzification of the self-organizing feature map: will it work? , 1993 .

[69]  Nikhil R. Pal,et al.  Texture Generation for Fashion Design Using Genetic Programming , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[70]  Kuhu Pal,et al.  Modeling Dehydriding Behavior of Hydrogen Storage Materials with Neural Networks , 1998, ICONIP.

[71]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[72]  J. Dahlberg,et al.  Molecular biology. , 1977, Science.

[73]  M. Ohnishi,et al.  Enhanced expression of the protein kinase substrate annexin in human hepatocellular carcinoma. , 1996, Hepatology.

[74]  Simon Lin,et al.  Methods of microarray data analysis III , 2002 .

[75]  J. Rodrigo,et al.  Annexin A1 down-regulation in head and neck cancer is associated with epithelial differentiation status. , 2004, The American journal of pathology.

[76]  R. Strausberg,et al.  Identifying potential tumor markers and antigens by database mining and rapid expression screening. , 2000, Genome research.

[77]  J. Das,et al.  Detection of Microcalcification with Neural Networks , 2006, 2006 IEEE International Conference on Engineering of Intelligent Systems.

[78]  P R Taylor,et al.  Loss of annexin 1 correlates with early onset of tumorigenesis in esophageal and prostate carcinoma. , 2000, Cancer research.

[79]  J. Vishwanatha,et al.  Absence of annexin I expression in B-cell non-Hodgkin's lymphomas and cell lines , 2004, BMC Cancer.

[80]  J. Downing,et al.  Gene Expression Profiling of Pediatric Acute Myelogenous Leukemia Materials and Methods , 2022 .

[81]  Thomas A. Runkler,et al.  Some issues in system identification using clustering , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[82]  G. Roberto Burgio,et al.  The Wiskott-Aldrich syndrome , 1995, European Journal of Pediatrics.

[83]  Fillia Makedon,et al.  HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data , 2005, Bioinform..

[84]  Nikhil R. Pal,et al.  Deriving meaningful rules from gene expression data for classification , 2008, J. Intell. Fuzzy Syst..

[85]  V. P. Eswarakumar,et al.  Cellular signaling by fibroblast growth factor receptors. , 2005, Cytokine & growth factor reviews.

[86]  Nikhil R. Pal,et al.  SOGARG: A self-organized genetic algorithm-based rule generation scheme for fuzzy controllers , 2003, IEEE Trans. Evol. Comput..

[87]  Nikhil R. Pal A fuzzy rule based approach to identify biomarkers for diagnostic classification of cancers , 2007, 2007 IEEE International Fuzzy Systems Conference.

[88]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[89]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[90]  Nikhil R. Pal,et al.  Simultaneous Structure Identification and Fuzzy Rule Generation for Takagi–Sugeno Models , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[91]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[92]  Nikhil R. Pal,et al.  SUM-PI Network: A New Multilayered Feed-Forward Network , 1998, ICONIP.

[93]  Nikhil R. Pal,et al.  Land cover classification using fuzzy rules and aggregation of contextual information through evidence theory , 2006, IEEE Transactions on Geoscience and Remote Sensing.

[94]  N. R. Pal,et al.  Design of a nearest-prototype classifier with dynamically generated prototypes using self-organizing feature maps , 1999, ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378).

[95]  Amitava Datta,et al.  A multilayer self-organizing model for convex-hull computation , 2001, IEEE Trans. Neural Networks.

[96]  Jörg König,et al.  Expression and localization of human multidrug resistance protein (ABCC) family members in pancreatic carcinoma , 2005, International journal of cancer.

[97]  Takanori Hattori,et al.  CD10 expression is useful in the diagnosis of follicular carcinoma and follicular variant of papillary thyroid carcinoma. , 2003, Thyroid : official journal of the American Thyroid Association.

[98]  S. Asa,et al.  Dual inhibition of RET and FGFR4 restrains medullary thyroid cancer cell growth. , 2005, Clinical cancer research : an official journal of the American Association for Cancer Research.

[99]  James C. Bezdek,et al.  An index of topological preservation and its application to self-organizing feature maps , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[100]  T Hayakawa,et al.  Invasion activating caveolin-1 mutation in human scirrhous breast cancers. , 2001, Cancer research.

[101]  Jan E Schnitzer,et al.  Tumor cell growth inhibition by caveolin re-expression in human breast cancer cells , 1998, Oncogene.

[102]  S. Chakraborty,et al.  A Hierarchical Algorithm for Classification and its Tuning by Genetic Algorithms , 1997, ICONIP.

[103]  G Deléage,et al.  FVT-1, a novel human transcription unit affected by variant translocation t(2;18)(p11;q21) of follicular lymphoma. , 1993, Blood.

[104]  K. Sullivan,et al.  A multiinstitutional survey of the Wiskott-Aldrich syndrome. , 1994, The Journal of pediatrics.

[105]  Nikhil R. Pal,et al.  Two connectionist schemes for selecting groups of features (sensors) , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[106]  Nikhil R. Pal,et al.  Some novel classifiers designed using prototypes extracted by a new scheme based on self-organizing feature map , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[107]  M. Noguchi,et al.  Phenotypic differences of proliferating fibroblasts in the stroma of lung adenocarcinoma and normal bronchus tissue , 2004, Cancer science.

[108]  Nikhil R. Pal,et al.  Learning fuzzy rules for controllers with genetic algorithms , 2003, Int. J. Intell. Syst..

[109]  Guy Cavet,et al.  Functional genomics identifies ABCC3 as a mediator of taxane resistance in HER2-amplified breast cancer. , 2008, Cancer research.

[110]  David Chia,et al.  Decreased expression of annexin A1 is correlated with breast cancer development and progression as determined by a tissue microarray analysis. , 2006, Human pathology.

[111]  Weida Tong,et al.  Multiclass Decision Forest--a novel pattern recognition method for multiclass classification in microarray data analysis. , 2004, DNA and cell biology.

[112]  A. Ullrich,et al.  The cDNA Microarray Profiling of Protein Kinases and Phosphatases: Molecular Portrait of Human Prostate Carcinomas , 2004, Molecular Biology.

[113]  Yusuke Nakamura,et al.  Microarray Analysis of Gene‐expression Profiles in Diffuse Large B‐cell Lymphoma: Identification of Genes Related to Disease Progression , 2002, Japanese journal of cancer research : Gann.

[114]  M. Breslin,et al.  The insulinoma-associated 1: a novel promoter for targeted cancer gene therapy for small-cell lung cancer , 2006, Cancer Gene Therapy.