Multimodal Deep Boltzmann Machines for feature selection on gene expression data

In this paper, multimodal Deep Boltzmann Machines (DBM) is employed to learn important genes (biomarkers) on gene expression data from human carcinoma colorectal. The learning process involves gene expression data and several patient phenotypes such as lymph node and distant metastasis occurrence. The proposed framework in this paper uses multimodal DBM to train records with metastasis occurrence. Later, the trained model is tested using records with no metastasis occurrence. After that, Mean Squared Error (MSE) is measured from the reconstructed and the original gene expression data. Genes are ranked based on the MSE value. The first gene has the highest MSE value. After that, k-means clustering is performed using various number of genes. Features that give the highest purity index are considered as the important genes. The important genes obtained from the proposed framework and two sample t-test are being compared. From the accuracy of metastasis classification, the proposed framework gives higher results compared to the top genes from two sample t-test.