Fine-Tuning Language Models with Advantage-Induced Policy Alignment