SlideSeer: a digital library of aligned document and presentation pairs

Research findings are often transmitted both as written documents and narrated slide presentations. As these two forms of media contain both unique and replicated information, it is useful to combine and align these two views to create a single synchronized medium. We introduce SlideSeer, a digital library that discovers, aligns and presents such presentation and document pairs. We discuss the three major system components of the SlideSeer DL: 1) the resource discovery, 2) the fine-grained alignment and 3) the user interface. For resource discovery, we have bootstrapped our collection building process using metadata from DBLP and CiteSeer. For alignment, we modify maximum similarity alignment to favor monotonic alignments and incorporate a classifier to handle slides which should not be aligned. For the user interface, we allow the user to seamlessly switch between four carefully motivated views of the resulting synchronized media pairs.

[1]  Hongyan Jing,et al.  Using Hidden Markov Modeling to Decompose Human-Written Summaries , 2002, Computational Linguistics.

[2]  Denis Lalanne,et al.  Using bi-modal alignment and clustering techniques for documents and speech thematic segmentations , 2004, CIKM '04.

[3]  Byung-Won On,et al.  PaSE: Locating Online Copy of Scientific Documents Effectively , 2004, ICADL.

[4]  Dekai Wu,et al.  Aligning a Parallel English-Chinese Corpus Statistically With Lexical Criteria , 1994, ACL.

[5]  Yingping Huang,et al.  Web data integration using approximate string join , 2004, WWW Alt. '04.

[6]  Robert L. Mercer,et al.  Aligning Sentences in Parallel Corpora , 1991, ACL.

[7]  Denis Lalanne,et al.  From searching to browsing through multimodal documents linking , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[8]  Laurent Denoue,et al.  Seamless Capture and Discovery for Corporate Memory , 2006 .

[9]  David W. Andrews,et al.  Scientific Papers and Presentations , 1996 .

[10]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[11]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[12]  Evangelos P. Markatos,et al.  PaperFinder: A Tool for Scalable Search of Digital Libraries , 1998, WebNet.

[13]  Ye Wang,et al.  LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics , 2004, MULTIMEDIA '04.

[14]  Gerd Hoff,et al.  Finding Scientific Papers with HPSearch and MOPS , 1999, Universität Trier, Mathematik/Informatik, Forschungsbericht.

[15]  H. Nanba,et al.  Alignment between a technical paper and presentation sheets using a hidden Markov model , 2005, Proceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005)..

[16]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[17]  Gerd Hoff,et al.  Finding scientific papers with homepagesearch and MOPS , 2001, SIGDOC '01.

[18]  Craig A. Knoblock,et al.  Learning Blocking Schemes for Record Linkage , 2006, AAAI.

[19]  Ralph Grishman,et al.  A Multilingual Procedure for Dictionary-Based Sentence Alignment , 1998, AMTA.