In this paper, we present a new approach for object segmentation in videos based on Gaussian Mixture Models (GMMs) and Spatial Temporal Markov Random Field (ST-MRF). Instead of considering spatial correlation of a pixel and its neighborhood in traditional spatial MRF models, the proposed method established the ST-MRF model to extend the correlation among adjacent frame pixels. In process of training ST-MRF, the proposed model calculates the means and variances of each partition regions during the sequential frames by updating parameters of GMMs. Moreover, the energy function of ST-MRF is improved by calculating the spatial-temporal pixels’ neighboring cliques as referential item. The experiments are compared with some state-of-the-art methods, such as standard GMMs, Meanshift and Fussy C Mean (FCM), on public standard video library and complex videos in real environments. The results demonstrate that our approach has performance improvements on robustness, accuracy, and effectiveness.