Policy Optimization of the Power Allocation Algorithm Based on the Actor–Critic Framework in Small Cell Networks