Towards learning a semantic-consistent subspace for cross-modal retrieval