Transcription and Analysis of Multimodal Texts