Feature selection method using WF-LASSO for gene expression data analysis

There has been a lot of research that demonstrates the phenomenon of life or the origin of the disease, and classifies or diagnoses the state of the cell. These are usually achieved by the strength of the gene expression under certain circumstances using microarrays, which can observe tens and thousands of gene expression profiles. It is not feasible to use all the attributes because of the huge amount of gene expression data that are involved in microarray experiments. It is not feasible to use all the attributes because a lots of gene expression data are involved in microarray experiments. That is, because microarray data have a small number of samples compared to the number of the attributes, in the analyzing of the data there will be overfitting which requires a high cost due to the high dimensionality of the data. We propose a feature selection method using a technique which combines filter method with wavelet transform, and LASSO regression method based on a statistical regression analysis. We obtain the best classification results by applying, in order, the DWT, the filter method, and then finally LASSO. That is, the feature selection method with the best classification performance was WF-LASSO method. The contribution of this paper is in that it is possible to solve problems by reducing the dimensionality of a high volume of data by using the proposed method, so that the performance of the classification can be improved and a more stable classification model can be constructed.