Modeling video using input/output Markov models with application to multi-modal event detection