Cali-NCE: Boosting Cross-modal Video Representation Learning with Calibrated Alignment