Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling