A deep action-oriented video image classification system for text detection and recognition