Summarizing videos into a target language: Methodology, architectures and evaluation

The aim of the work is to report the results of the Chist-Era project AMIS (Access Multilingual Information opinionS). The purpose of AMIS is to answer the following question: How to make the information in a foreign language accessible for everyone? This issue is not limited to translate a source video into a target language video since the objective is to provide only the main idea of an Arabic video in English. This objective necessitates developing research in several areas that are not, all arrived at a maturity state: Video summarization, Speech recognition, Machine translation, Audio summarization and Speech segmentation. In this article we present several possible architectures to achieve our objective, yet we focus on only one of them. The scientific locks are be presented, and we explain how to deal with them. One of the big challenges of this work is to conceive a way to evaluate objectively a system composed of several components knowing that each of them has its limits and can propagate errors through the first component. Also, a subjective evaluation procedure is proposed in which several annotators have been mobilized to test the quality of the achieved summaries.

[1]  M. Sanderson Book Reviews: Advances in Automatic Text Summarization , 2000, Computational Linguistics.

[2]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[3]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[4]  Andreas Stolcke,et al.  A study in machine learning from imbalanced data for sentence boundary detection in speech , 2006, Comput. Speech Lang..

[5]  Boqing Gong,et al.  Query-Focused Video Summarization: Dataset, Evaluation, and a Memory Network Based Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Rao Muhammad Adeel Nawab,et al.  Natural Language Descriptions of Visual Scenes Corpus Generation and Analysis , 2012, ESIRMT/HyTra@EACL.

[7]  Inderjeet Mani,et al.  Summarization Evaluation: An Overview , 2001, NTCIR.

[8]  Daniel DeMenthon,et al.  Automatic Performance Evaluation for Video Summarization , 2004 .

[9]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Eric SanJuan,et al.  Summary Evaluation with and without References , 2010, Polytech. Open Libr. Int. Bull. Inf. Technol. Sci..

[12]  Avid,et al.  Adaptation of speech recognition vocabularies for improved transcription of YouTube videos , 2018 .

[13]  Juan-Manuel Torres-Moreno,et al.  Automatic Text Summarization: Torres-Moreno/Automatic Text Summarization , 2014 .

[14]  Bryan Pardo,et al.  Music/Voice Separation Using the Similarity Matrix , 2012, ISMIR.

[15]  Kamel Smaïli,et al.  Video Summarization Framework for Newscasts and Reports - Work in Progress , 2017, MCSS.

[16]  Lukás Burget,et al.  Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[17]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[18]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Juan-Manuel Torres-Moreno,et al.  Automatic Text Summarization , 2014 .

[20]  Lucjan Janowski,et al.  Evaluation of Multimedia Content Summarization Algorithms , 2018, MISSI.

[21]  Andreas Eisele,et al.  MultiUN: A Multilingual Corpus from United Nation Documents , 2010, LREC.

[22]  Remigiusz Baran,et al.  The IMCOP System for Data Enrichment and Content Discovery and Delivery , 2015, 2015 International Conference on Computational Science and Computational Intelligence (CSCI).

[23]  John M. Conroy,et al.  An Assessment of the Accuracy of Automatic Evaluation in Summarization , 2012, EvalMetrics@NAACL-HLT.

[24]  P. Mermelstein,et al.  Distance measures for speech recognition, psychological and instrumental , 1976 .

[25]  Alexandre Quemy,et al.  Unsupervised Video Semantic Partitioning Using IBM Watson and Topic Modelling , 2018, EDBT/ICDT Workshops.

[26]  Juan-Manuel Torres-Moreno,et al.  Sentence Boundary Detection for French with Subword-Level Information Vectors and Convolutional Neural Networks , 2018, ArXiv.

[27]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[28]  Khalid Choukri,et al.  Network of Data Centres (NetDC): BNSC - An Arabic Broadcast News Speech Corpus , 2004, LREC.

[29]  Ani Nenkova,et al.  Automatic Text Summarization of Newswire: Lessons Learned from the Document Understanding Conference , 2005, AAAI.

[30]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[31]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[33]  Ani Nenkova,et al.  Automatically Evaluating Content Selection in Summarization without Human Models , 2009, EMNLP.

[34]  Kamel Smaïli,et al.  Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect , 2017, ACLING.

[35]  Peter Bell,et al.  A system for automatic broadcast news summarisation, geolocation and translation , 2015, INTERSPEECH.

[36]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .