Extractive Text-Based Summarization of Arabic Videos: Issues, Approaches and Evaluations

In this paper, we present and evaluate a method for extractive text-based summarization of Arabic videos. The algorithm is proposed in the scope of the AMIS project that aims at helping a user to understand videos given in a foreign language (Arabic). For that, the project proposes several strategies to translate and summarize the videos. One of them consists in transcribing the Arabic videos, summarizing the transcriptions, and translating the summary. In this paper we describe the video corpus that was collected from YouTube and present and evaluate the transcription-summarization part of this strategy. Moreover, we present the Automatic Speech Recognition (ASR) system used to transcribe the videos, and show how we adapted this system to the Algerian dialect. Then, we describe how we automatically segment into sentences the sequence of words provided by the ASR system, and how we summarize the obtained sequence of sentences. We evaluate objectively and subjectively our approach. Results show that the ASR system performs well in terms of Word Error Rate on MSA, but needs to be adapted for dealing with Algerian dialect data. The subjective evaluation shows the same behaviour than ASR: transcriptions for videos containing dialectal data were better scored than videos containing only MSA data. However, summaries based on transcriptions are not as well rated, even when transcriptions are better rated. Last, the study shows that features, such as the lengths of transcriptions and summaries, and the subjective score of transcriptions, explain only 31% of the subjective score of summaries.

[1]  Karima Meftouh,et al.  A Study of a Non-Resourced Language: The Case of one of the Algerian Dialects , 2012 .

[2]  Kamel Smaïli,et al.  CALYOU: A Comparable Spoken Algerian Corpus Harvested from YouTube , 2017, INTERSPEECH.

[3]  Kamel Smaïli,et al.  Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect , 2017, ACLING.

[4]  Kamel Smaïli,et al.  Video Summarization Framework for Newscasts and Reports - Work in Progress , 2017, MCSS.

[5]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[6]  Richard M. Schwartz,et al.  The effects of speech recognition and punctuation on information extraction performance , 2005, INTERSPEECH.

[7]  John Makhoul Information Extraction from speech , 2006, SLT.

[8]  Karima Meftouh,et al.  Creating Parallel Arabic Dialect Corpus: Pitfalls to Avoid , 2017 .

[9]  Sheng Li,et al.  Automatic Speech Recognition , 2020, Speech-to-Speech Translation.

[10]  Sadaoki Furui,et al.  Automatic Sentence Segmentation of Speech for Automatic Summarization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[12]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[13]  Karima Meftouh,et al.  Machine Translation Experiments on PADIC: A Parallel Arabic DIalect Corpus , 2015, PACLIC.

[14]  John W. Merrill,et al.  Automatic Speech Recognition , 2005 .

[15]  Karima Meftouh,et al.  Maghrebi Arabic dialect processing: an overview , 2017 .

[16]  James R. Glass,et al.  A complete KALDI recipe for building Arabic speech recognition systems , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[17]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[18]  Karima Meftouh,et al.  PADIC: extension and new experiments , 2018 .

[19]  Karima Meftouh,et al.  Grapheme to phoneme conversion: an Arabic dialect case , 2014, SLTU.

[20]  Yoshihiko Gotoh,et al.  Sentence Boundary Detection in Broadcast Speech Transcripts , 2000 .

[21]  Juan-Manuel Torres-Moreno Artex is AnotheR TEXt summarizer , 2012, ArXiv.

[22]  Juan-Manuel Torres-Moreno,et al.  Automatic Text Summarization: Torres-Moreno/Automatic Text Summarization , 2014 .

[23]  Juan-Manuel Torres-Moreno,et al.  Cross-Lingual Speech-to-Text Summarization , 2018, MISSI.

[24]  Juan-Manuel Torres-Moreno,et al.  Automated Sentence Boundary Detection in Modern Standard Arabic Transcripts using Deep Neural Networks , 2018, ACLING.

[25]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[26]  Christoph Meinel,et al.  Punctuation Prediction for Unsegmented Transcript Based on Word Vector , 2016, LREC.

[27]  Guillaume Gravier,et al.  The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.