DeepCon: An End-to-End Multilingual Toolkit for Automatic Minuting of Multi-Party Dialogues

In this paper, we present our minuting tool DeepCon, an end-to-end toolkit for minuting the multiparty dialogues of meetings. It provides technological support for (multilingual) communication and collaboration, with a specific focus on Natural Language Processing (NLP) technologies: Automatic Speech Recognition (ASR), Machine Translation (MT), Automatic Minuting (AM), Topic Modelling (TM) and Named Entity Recognition (NER). To the best of our knowledge, there is no such tool available. Further, this tool follows a microservice architecture, and we release the tool as open-source, deployed on Amazon Web Services (AWS). We release our tool open-source here http://www.deepcon.in.

[1]  Ondrej Bojar,et al.  Overview of the First Shared Task on Automatic Minuting (AutoMin) at Interspeech 2021 , 2021, First Shared Task on Automatic Minuting at Interspeech 2021.

[2]  Muskaan Singh,et al.  Team AutoMinuters @ AutoMin 2021: Leveraging state-of-the-art Text Summarization model to Generate Minutes using Transfer Learning , 2021, First Shared Task on Automatic Minuting at Interspeech 2021.

[3]  Hiroaki Ozaki,et al.  Team Hitachi @ AutoMin 2021: Reference-free Automatic Minuting Pipeline with Argument Structure Construction over Topic-based Summarization , 2021, First Shared Task on Automatic Minuting at Interspeech 2021.

[4]  Jörg Tiedemann,et al.  OPUS-MT – Building open translation services for the World , 2020, EAMT.

[5]  J. Pino,et al.  Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq , 2020, AACL.

[6]  Yves Scherrer,et al.  TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages , 2020, LREC.

[7]  Oleg V. Vasilyev,et al.  Fill in the BLANC: Human-free quality estimation of document summaries , 2020, EVAL4NLP.

[8]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[9]  Peter J. Liu,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[10]  Aleksander Wawer,et al.  SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization , 2019, EMNLP.

[11]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[12]  Peter J. Liu,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[13]  Jason Baldridge,et al.  PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.

[14]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15]  Mattia Antonino Di Gangi,et al.  MuST-C: a Multilingual Speech Translation Corpus , 2019, NAACL.

[16]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[17]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[18]  Mathias Creutz,et al.  Open Subtitles Paraphrase Corpus for Six Languages , 2018, LREC.

[19]  Pushpak Bhattacharyya,et al.  The IIT Bombay English-Hindi Parallel Corpus , 2017, LREC.

[20]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[21]  P. Motlícek,et al.  Hierarchical Multi-task learning framework for Isometric-Speech Language Translation , 2022, IWSLT.

[22]  Ondrej Bojar,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.