Identification of interaction patterns and classification with applications to microarray data

Emerging patterns represent a class of interaction structures which has been recently proposed as a tool in data mining. A new and more general definition referring to underlying probabilities is proposed. The defined interaction patterns (IP) carry information about the relevance of combinations of variables for distinguishing between classes. Since they are formally quite similar to the leaves of a classification tree, a fast and simple method which is based on the CART algorithm is proposed to find the corresponding empirical patterns in data sets. In simulations, it can be shown that the method is quite effective in identifying patterns. In addition, the detected patterns can be used to define new variables for classification. Thus, a simple scheme to use the patterns to improve the performance of classification procedures is proposed. The method may also be seen as a scheme to improve the performance of CARTs concerning the identification of IP as well as the accuracy of prediction.

[1]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[2]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[3]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[4]  Gerhard Tutz,et al.  A CART-based approach to discover emerging patterns in microarray data , 2003, Bioinform..

[5]  Peter Bühlmann,et al.  Boosting for Tumor Classification with Gene Expression Data , 2003, Bioinform..

[6]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[7]  Berthold Lausen,et al.  Maximally selected rank statistics , 1992 .

[8]  Chris Lloyd,et al.  Regression Models for Convex ROC Curves , 2000, Biometrics.

[9]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[10]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[11]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[12]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[15]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[16]  ScienceDirect Computational statistics & data analysis , 1983 .

[17]  E. Venkatraman,et al.  A Permutation Test to Compare Receiver Operating Characteristic Curves , 2000, Biometrics.

[18]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[19]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[20]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[21]  Huiqing Liu,et al.  Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients , 2003, Bioinform..

[22]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.