MU-MIMO technology enables a base station (BS) to transmit signals to multiple users simultaneously on the same frequency band. It is a key technology for 5G NR to increase the data rate. In 5G specifications, an MU-MIMO scheduler needs to determine RBs allocation and MCS assignment to each user for each TTI. Under MU-MIMO, multiple users may be coscheduled on the same RB and each user may have multiple data streams simultaneously. In addition, the scheduler must meet the stringent real-time requirement (~1 ms) during decision making to be useful. This paper presents mCore, a novel 5G scheduler that can achieve ~1 ms scheduling with joint optimization of RB allocation and MCS assignment to MU-MIMO users. The key idea of mCore is to perform a multi-phase optimization, leveraging large-scale parallel computation. In each phase, mCore either decomposes the optimization problem into a number of independent sub-problems, or reduces the search space into a smaller but most promising subspace, or both. We implement mCore on a commercial-off-the-shelf GPU. Experimental results show that mCore can offer the best scheduling performance for up to 100 RBs, 100 users, 29 MCS levels and 4 × 12 antennas when compared to other state-of-the-art algorithms. It is also the only algorithm that can find its scheduling solution in ~1 ms.