Towards a model for mid-level feature representation of scenes