Gene expression classifiers and out-of-class samples detection

The proper application of statistics, machine learning, and data-mining techniques in routine clinical diagnostics to classify diseases using their genetic expression profile is still a challenge. One critical issue is the overall inability of most state-of-the-art classifiers to identify out-of-class samples, i.e., samples that do not belong to any of the available classes. This paper shows a possible explanation for this problem and suggests how, by analyzing the distribution of the class probability estimates generated by a classifier, it is possible to build decision rules able to significantly improve its performances.

[1]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[2]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[5]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[6]  Alfredo Benso,et al.  Differential gene expression graphs: A data structure for classification in DNA microarrays , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[7]  G. Gibson,et al.  Microarray Analysis , 2020, Definitions.

[8]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[9]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[10]  Alfredo Benso,et al.  A graph-based representation of Gene Expression profiles in DNA microarrays , 2008, 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[11]  Edward R. Dougherty,et al.  The fundamental role of pattern recognition for gene-expression/microarray data in bioinformatics , 2005, Pattern Recognit..

[12]  Volker Roth,et al.  Bayesian class discovery in microarray datasets , 2004, IEEE Transactions on Biomedical Engineering.

[13]  S. Buttrey,et al.  Using k -nearest-neighbor classification in the leaves of a tree , 2002 .