CM-test: An Innovative Divergence Measurement and Its Application in Diabetes Gene Expression Data Analysis

One important problem in data analysis is to effec- tively measure the divergence of two sets of values of a feature, each from a group of samples with a particular condition. Such a measurement is the foundation for identifying critical features that contribute to the difference between the two conditions. The two traditional methods t-test and Wilcoxon rank sum test measure this divergence indirectly, using the difference of the means of the two groups and the sum of the ranks from one of the groups, respectively. In this paper, we propose an innovative approach based on fuzzy set theory, the Cluster Misclassification test (CM-test), to quantify the divergence directly and robustly. To validate our approach, we conducted experiments on both synthetic and real diabetes gene expression datasets. On the synthetic datasets, we observed that CM-test effectively quantifies the divergence of two sets. On the real diabetes dataset, we observed that in the top ten genes identified by CM-test, eight of them have been confirmed to be associated with diabetes in the literature. We suggest the remaining two genes, M95610 and M88461, as two potential diabetic genes for further biological investigation. Therefore, we recommend that CM-test be another effective method for measuring the divergence of two sets, complementing t-test and Wilcoxon rank sum test in practice.

[1]  S. Grossberg,et al.  Leucine catabolism during the differentiation of 3T3-L1 cells. Expression of a mitochondrial enzyme system. , 1983, The Journal of biological chemistry.

[2]  C. Ronald Kahn Insulin Induces the Phosphorylation of Nucleolin , 1993 .

[3]  Alex Mas,et al.  Overexpression of c‐myc in the liver prevents obesity and insulin resistance , 2003, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[4]  Bernard R. Rosner,et al.  Fundamentals of Biostatistics. , 1992 .

[5]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[6]  Kenneth G. Manton,et al.  Fuzzy Cluster Analysis , 2005 .

[7]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  M. Clark,et al.  Acute glucosamine-induced insulin resistance in muscle in vivo is associated with impaired capillary recruitment , 2005, Diabetologia.

[9]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[10]  C. Mitchell,et al.  The SH2 domain containing inositol polyphosphate 5-phosphatase-2: SHIP2. , 2005, The international journal of biochemistry & cell biology.

[11]  J. C. Peters,et al.  Fuzzy Cluster Analysis : A New Method to Predict Future Cardiac Events in Patients With Positive Stress Tests , 1998 .

[12]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[13]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[14]  Andreas H. Guse,et al.  Mechanisms involved in α6β1-integrin-mediated Ca2+ signalling , 2001 .

[15]  Y. Nakatani,et al.  The endoplasmic reticulum chaperone improves insulin resistance in type 2 diabetes. , 2005, Diabetes.

[16]  S. Zhao,et al.  Identification and characterization of the human HCG V gene product as a novel inhibitor of protein phosphatase-1. , 1998, Biochemistry.

[17]  C. Bogardus,et al.  Microarray profiling of skeletal muscle tissues from equally obese, non-diabetic insulin-sensitive and insulin-resistant Pima Indians , 2002, Diabetologia.

[18]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[19]  C. Kahn,et al.  Insulin induces the phosphorylation of nucleolin. A possible mechanism of insulin-induced RNA efflux from nuclei. , 1993, The Journal of biological chemistry.

[20]  Michael G. Roper,et al.  Islet secretory defect in insulin receptor substrate 1 null mice is linked with reduced calcium signaling and expression of sarco(endo)plasmic reticulum Ca2+-ATPase (SERCA)-2b and -3. , 2004, Diabetes.

[21]  Daniel G. Brown,et al.  Classification and Boundary Vagueness in Mapping Presettlement Forest Types , 1998, Int. J. Geogr. Inf. Sci..

[22]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  K. Kamata,et al.  Possible involvement of IGF‐1 receptor and IGF‐binding protein in insulin‐induced enhancement of noradrenaline response in diabetic rat aorta , 2003, British journal of pharmacology.

[24]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[25]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[26]  M. Sakata,et al.  Autoimmunity against YKL-39, a human cartilage derived protein, in patients with osteoarthritis. , 2002, The Journal of rheumatology.

[27]  B. Potter,et al.  Mechanisms involved in alpha6beta1-integrin-mediated Ca(2+) signalling. , 2001, Cellular signalling.