SMSPL: Robust Multimodal Approach to Integrative Analysis of Multiomics Data

With the recent advancement of technologies, it is progressively easier to collect diverse types of genome-wide data. It is commonly expected that by analyzing these data in an integrated way, one can improve the understanding of a complex biological system. Current methods, however, are prone to overfitting heavy noise such that their applications are limited. High noise is one of the major challenges for multiomics data integration. This may be the main cause of overfitting and poor performance in generalization. A sample reweighting strategy is typically used to cope with this problem. In this article, we propose a robust multimodal data integration method, called SMSPL, which can simultaneously predict subtypes of cancers and identify potentially significant multiomics signatures. Especially, the proposed method leverages the linkages between different types of data to interactively recommend high-confidence samples, adopts a new soft weighting scheme to assign weights to the training samples of each type, and then iterates between weights recalculating and classifiers updating. Simulation and five real experiments substantiate the capability of the proposed method for classification and identification of significant multiomics signatures with heavy noise. We expect SMSPL to take a small step in the multiomics data integration and help researchers comprehensively understand the biological process.