Prediction of the O-Glycosylation by Support Vector Machines and Semi-supervised Learning

Glycosylation is one of the main topics in understanding the life systems. More than a half of the protein is glycosylated to acquire the function, structural stability and biological diversity. O-glycosylation is one of the two main types of the mammalian protein glycosylation. Though it is known to serine or threonine specific, any consensus sequence is still unknown, while the binding process and the consensus sequence are clarified for the other type of N-glycosylation. We use support vector machines (SVM) for the prediction of O-glycosylation sites using the experimental data as the input information such as protein primary sequences, structural and biochemical characters around a prediction target aiming to elucidate the glycosylation mechanism and the existence of any motives. The present paper also reports the results obtained by the semi-supervised learning using transductive SVM considering a possibility of unobserved glycosylation sites, and by the marginalized kernel considering hidden variables.