Preprocessing and Feature Extraction Techniques for Multimodal Interactive Transcription of Text Images