MBMM: Moment Estimating Beta Mixture Model-Based Clustering Algorithm for m6A Co-methylation Module Mining

Background: m6A methylation is a ubiquitous post-transcriptional modification that exists in mammals. MeRIP-seq technology makes the acquisition of m6A data in the whole transcriptome under different conditions realizable. The specific regulation of the enzyme will present co-methylation module on m6A methylation level data. Thus, mining the co-methylation module from which can help to unveil the mechanism of m6A methylation modification and its mechanism in the occurrence and development of complex diseases such as cancer. Objective: To develop a clustering algorithm that can effectively realize the mining of m6 co-methylation module. Method: In this study, a novel beta mixture model-based clustering algorithm named MBMM was proposed, which is based on the EM framework and introduces the method of moment estimating in M-step for parameter estimation to tackle the high-dimensional small sample m6A data. Simulation research was employed to evaluate the clustering performance of the proposed algorithm, and by which the co-methylation module mining was done based on real data. Biological significance correlation analysis was employed to explore whether the clustering results are co-methylation modules. Results and Conclusion: Simulation research demonstrated that MBMM performed out than other clustering algorithms. In real data, seven co-methylation modules were found by MBMM. Six m6A-related pathways specific analysis showed that six co-methylation modules were enriched in the pathway and were different. Five enzymes substrate-specific analysis revealed that seven co-methylation modules expressed varying degrees of enrichment. Gene Ontology enrichment analysis indicated that these modules may be regulated by enzymes while having potential functional specificity.