Zero-Shot Multi-Label Topic Inference with Sentence Encoders

Sentence encoders have indeed been shown to achieve superior performances for many downstream text-mining tasks and, thus, claimed to be fairly general. Inspired by this, we performed a detailed study on how to leverage these sentence encoders for the"zero-shot topic inference"task, where the topics are defined/provided by the users in real-time. Extensive experiments on seven different datasets demonstrate that Sentence-BERT demonstrates superior generality compared to other encoders, while Universal Sentence Encoder can be preferred when efficiency is a top priority.

[1]  Sheikh Rabiul Islam,et al.  Ad-Hoc Monitoring of COVID-19 Global Research Trends for Well-Informed Policy Making , 2022, ACM Trans. Intell. Syst. Technol..

[2]  Kaizhu Huang,et al.  Zero-Shot Text Classification via Knowledge Graph Embedding for Social Media Data , 2022, IEEE Internet of Things Journal.

[3]  Shubhra (Santu) Karmaker,et al.  Concept Annotation from Users Perspective: A New Challenge , 2022, WWW.

[4]  Apurva Shah,et al.  Topic Modeling Using Latent Dirichlet allocation , 2021, ACM Comput. Surv..

[5]  Xingyi Cheng,et al.  Dual-View Distilled BERT for Sentence Embedding , 2021, SIGIR.

[6]  Sheikh Rabiul Islam,et al.  COVID19α: Interactive Spatio-Temporal Visualization of COVID-19 Symptoms through Tweet Analysis , 2021, IUI Companion.

[7]  Tuomas Virtanen,et al.  Zero-Shot Audio Classification Via Semantic Embeddings , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Hoda Eldardiry,et al.  Zero-shot Relation Classification from Side Information , 2020, CIKM.

[9]  Gita Sukthankar,et al.  A Transfer Learning Approach for Dialogue Act Classification of GitHub Issue Comments , 2020, ArXiv.

[10]  Shervin Minaee,et al.  Covid-Transformer: Detecting COVID-19 Trending Topics on Twitter Using Universal Sentence Encoder , 2020, ArXiv.

[11]  H. Ghézala,et al.  Duplicate record detection approach based on sentence embeddings , 2020, 2020 IEEE 29th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE).

[12]  Issa Annamoradnejad,et al.  ColBERT: Using BERT Sentence Embedding for Humor Detection , 2020, ArXiv.

[13]  Matthew Henderson,et al.  Efficient Intent Detection with Dual Sentence Encoders , 2020, NLP4CONVAI.

[14]  Bryan Catanzaro,et al.  Zero-shot Text Classification With Generative Language Models , 2019, ArXiv.

[15]  Marcus Zimmermann,et al.  Empirical Study of Sentence Embeddings for English Sentences Quality Assessment , 2019, 2019 International Conference on Computational Science and Computational Intelligence (CSCI).

[16]  Shubhra Kanti Karmaker Santu,et al.  Towards Automated Sexual Violence Report Tracking , 2019, ICWSM.

[17]  Dan Roth,et al.  Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach , 2019, EMNLP.

[18]  Yike Guo,et al.  Integrating Semantic Knowledge to Tackle Zero-shot Text Classification , 2019, NAACL.

[19]  T. Takiguchi,et al.  Semantic embeddings of generic objects for zero-shot learning , 2019, EURASIP J. Image Video Process..

[20]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[21]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[22]  Yifan Peng,et al.  BioSentVec: creating sentence embeddings for biomedical texts , 2018, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[23]  Feng Ji,et al.  Improving Multilingual Semantic Textual Similarity with Shared Sentence Encoder for Low-resource Languages , 2018, ArXiv.

[24]  Jihong Ouyang,et al.  Dataless Text Classification: A Topic Modeling Approach with Document Manifold , 2018, CIKM.

[25]  Ramakanth Kavuluru,et al.  Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces , 2018, EMNLP.

[26]  Jiawei Han,et al.  Weakly-Supervised Neural Text Classification , 2018, CIKM.

[27]  Philip S. Yu,et al.  Zero-shot User Intent Detection via Capsule Neural Networks , 2018, EMNLP.

[28]  Christian S. Perone,et al.  Evaluation of sentence embeddings in downstream and linguistic probing tasks , 2018, ArXiv.

[29]  Muktabh Mayank Srivastava,et al.  Train Once, Test Anywhere: Zero-Shot Learning for Text Classification , 2017, ArXiv.

[30]  Yongli Wang,et al.  Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey , 2017, Multimedia Tools and Applications.

[31]  D. Zha,et al.  Multi-label dataless text classification with topic modeling , 2017, Knowledge and Information Systems.

[32]  Yonatan Belinkov,et al.  Analysis of sentence embedding models using prediction tasks in natural language processing , 2017, IBM J. Res. Dev..

[33]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[34]  Wang Ling,et al.  Generative and Discriminative Text Classification with Recurrent Neural Networks , 2017, ArXiv.

[35]  Shubhra Kanti Karmaker Santu,et al.  Generative Feature Language Models for Mining Implicit Features from Customer Reviews , 2016, CIKM.

[36]  Philip S. Yu,et al.  Active Zero-Shot Learning , 2016, CIKM.

[37]  Yonatan Belinkov,et al.  Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[38]  Shazia Wasim Sadiq,et al.  A Spatial-Temporal Topic Model for the Semantic Annotation of POIs in LBSNs , 2016, ACM Trans. Intell. Syst. Technol..

[39]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[40]  C. Lee Giles,et al.  A generalized topic modeling approach for automatic document annotation , 2015, International Journal on Digital Libraries.

[41]  Jordan L. Boyd-Graber,et al.  Speeding Document Annotation with Topic Models , 2015, NAACL.

[42]  Sutanu Chakraborti,et al.  Topic labeled text classification: a weakly supervised approach , 2014, SIGIR.

[43]  Lan Du,et al.  Topic Segmentation with a Structured Topic Model , 2013, NAACL.

[44]  ChengXiang Zhai,et al.  Structural Topic Model for Latent Topical Structure Analysis , 2011, ACL.

[45]  Naonori Ueda,et al.  Modeling Social Annotation Data with Content Relevance using a Topic Model , 2009, NIPS.

[46]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.

[47]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[48]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[49]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[50]  Shubhra (Santu) Karmaker,et al.  Exploring Universal Sentence Encoders for Zero-shot Text Classification , 2022, AACL.

[51]  Soumayan Bandhu Majumder,et al.  Detecting Fake News Spreaders on Twitter Using Universal Sentence Encoder , 2020, CLEF.

[52]  Hebatallah A. Mohamed Hassan,et al.  BERT, ELMo, USE and InferSent Sentence Encoders: The Panacea for Research-Paper Recommendation? , 2019, RecSys.

[53]  Johannes Fürnkranz,et al.  Using semantic similarity for multi-label zero-shot classification of text documents , 2016, ESANN.

[54]  Khalid Alfalqi,et al.  A Survey of Topic Modeling in Text Mining , 2015 .

[55]  Marie-Francine Moens,et al.  Automatic annotation of unique locations from video and text , 2010, BMVC.

[56]  Markus Bundschus,et al.  Models for Semantically Annotated Document Collections , 2009 .