Alignment of spoken utterances with slide content for easier learning with recorded lectures using structured support vector machine (SVM)

This paper reports the first known effort to automatically align the spoken utterances in recorded lectures with the content of the slides used. Such technologies will be very useful in Massive Open On-line Courses (MOOCs) and various recorded lectures as well as many other applications. We propose a set of approaches considering the problem that words helpful for such alignment are sparse and noisy, and the assumption that the presentation of a slide is usually smooth and top-down across the slide. This includes utterance clustering, entropy-based word filtering, reliability-propagated word-based matching, and the structured support vector machine (SVM) learning from local and global features. Initial experimental results with the lectures in a course offered in National Taiwan University showed very encouraging results as compared to the baseline approaches.

[1]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[2]  Adam Kilgarriff,et al.  of the European Chapter of the Association for Computational Linguistics , 2006 .

[3]  Lin-Shan Lee,et al.  Semantic Analysis and Organization of Spoken Documents Based on Parameters Derived From Latent Topics , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Jignesh M. Patel,et al.  Estimating the selectivity of tf-idf based cosine similarity predicates , 2007, SGMD.

[5]  H. Nanba,et al.  Alignment between a technical paper and presentation sheets using a hidden Markov model , 2005, Proceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005)..

[6]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[7]  Lin-Shan Lee,et al.  Recognition of highly imbalanced code-mixed bilingual speech with frame-level language detection based on blurred posteriorgram , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[9]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[10]  Min Zhang,et al.  Feature-Based Method for Document Alignment in Comparable News Corpora , 2009, EACL.

[11]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[12]  Min-Yen Kan SlideSeer: a digital library of aligned document and presentation pairs , 2007, JCDL '07.

[13]  Thorsten Joachims,et al.  Learning to Align Sequences: A Maximum-Margin Approach , 2006 .

[14]  Christopher Meek,et al.  Improving Similarity Measures for Short Segments of Text , 2007, AAAI.

[15]  Roxana Girju,et al.  Investigating Automatic Alignment Methods for Slide Generation from Academic Papers , 2009, CoNLL.

[16]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[17]  Min-Yen Kan,et al.  Multimodal alignment of scholarly documents and their presentations , 2013, JCDL '13.

[18]  Mark J. F. Gales,et al.  Structured Support Vector Machines for Noise Robust Continuous Speech Recognition , 2011, INTERSPEECH.

[19]  Susumu Kunifuji,et al.  Relevant Piece of Information Extraction from Presentation Slide Page for Slide Information Retrieval System , 2010, KICSS.