Learning-Based Mean-Payoff Optimization in Unknown Markov Decision Processes under Omega-Regular Constraints ?