论文信息 - Relaxed Linear Separability (RLS) Approach to Feature (Gene) Subset Selection

Relaxed Linear Separability (RLS) Approach to Feature (Gene) Subset Selection

Feature selection is one of active research area in pattern recognition or data mining methods (Duda et al., 2001). The importance of feature selection methods becomes apparent in the context of rapidly growing amount of data collected in contemporary databases (Liu & Motoda, 2008). Feature subset selection procedures are aimed at neglecting as large as possible number of such features (measurements) which are irrelevant or redundant for a given problem. The feature subset resulting from feature selection procedure should allow to build a model on the base of available learning data sets that generalizes better to new (unseen) data. For the purpose of designing classification or prediction models, the feature subset selection procedures are expected to produce higher classification or prediction accuracy. Feature selection problem is particularly important and challenging in the case when the number of objects represented in a given database is low in comparison to the number of features which have been used to characterise these objects. Such situation appears typically in exploration of genomic data sets where the number of features can be thousands of times greater than the number of objects. Here we are considering the relaxed linear separability (RLS) method of feature subset selection (Bobrowski & Łukaszuk, 2009). Such approach to feature selection problem refers to the concept of linear separability of the learning sets (Bobrowski, 2008). The term “relaxation” means here deterioration of the linear separability due to the gradual neglect of selected features. The considered approach to feature selection is based on repetitive minimization of the convex and piecewise-linear (CPL) criterion functions. These CPL criterion functions, which have origins in the theory of neural networks, include the cost of various features (Bobrowski, 2005). Increasing the cost of individual features makes these features falling out of the feature subspace. Quality the reduced feature subspaces is assessed by the accuracy of the CPL optimal classifiers built in this subspace. The article contains a new theoretical and experimental results related to the RLS method of feature subset selection. The experimental results have been achieved through the analysis, inter alia, two sets of genetic data.

Leon Bobrowski | Tomasz Łukaszuk | L. Bobrowski | T. Łukaszuk

[1] Leon Bobrowski,et al. Design of piecewise linear classifiers from formal neurons by a basis exchange technique , 1991, Pattern Recognit..

[2] T. Łukaszuk,et al. Feature Selection Based on Relaxed Linear Separability , 2009 .

[3] Jason Weston,et al. Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[4] David G. Stork,et al. Pattern Classification , 1973 .

[5] Marcos Dipinto,et al. Discriminant analysis , 2020, Predictive Analytics.

[6] Hiroshi Motoda,et al. Book Review: Computational Methods of Feature Selection , 2007, The IEEE intelligent informatics bulletin.

[7] Yudong D. He,et al. Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[8] Keinosuke Fukunaga,et al. Introduction to Statistical Pattern Recognition , 1972 .

[9] J. Mesirov,et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10] Leon Bobrowski,et al. Selection of the Linearly Separable Feature Subsets , 2004, ICAISC.

[11] Vladimir Vapnik,et al. Statistical learning theory , 1998 .