Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach