Unsupervised and self-supervised deep learning approaches for biomedical text mining

Biomedical scientific literature is growing at a very rapid pace, which makes increasingly difficult for human experts to spot the most relevant results hidden in the papers. Automatized information extraction tools based on text mining techniques are therefore needed to assist them in this task. In the last few years, deep neural networks-based techniques have significantly contributed to advance the state-of-the-art in this research area. Although the contribution to this progress made by supervised methods is relatively well-known, this is less so for other kinds of learning, namely unsupervised and self-supervised learning. Unsupervised learning is a kind of learning that does not require the cost of creating labels, which is very useful in the exploratory stages of a biomedical study where agile techniques are needed to rapidly explore many paths. In particular, clustering techniques applied to biomedical text mining allow to gather large sets of documents into more manageable groups. Deep learning techniques have allowed to produce new clustering-friendly representations of the data. On the other hand, self-supervised learning is a kind of supervised learning where the labels do not have to be manually created by humans, but are automatically derived from relations found in the input texts. In combination with innovative network architectures (e.g. transformer-based architectures), self-supervised techniques have allowed to design increasingly effective vector-based word representations (word embeddings). We show in this survey how word representations obtained in this way have proven to successfully interact with common supervised modules (e.g. classification networks) to whose performance they greatly contribute.

[1]  Mohamed Nadif,et al.  Directional co-clustering , 2019, Adv. Data Anal. Classif..

[2]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Dietrich Rebholz-Schuhmann,et al.  Deep learning-based clustering approaches for bioinformatics , 2020, Briefings Bioinform..

[5]  Viachaslau Sazonau,et al.  Transfer Learning for Biomedical Named Entity Recognition with BioBERT , 2019, SEMANTiCS.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[8]  Mohamed Nadif,et al.  CoClust: A Python Package for Co-Clustering , 2019, Journal of Statistical Software.

[9]  Luca Scrucca,et al.  Dimension reduction for model-based clustering , 2015, Stat. Comput..

[10]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[11]  Mohamed Nadif,et al.  Simultaneous Spectral Data Embedding and Clustering , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[13]  Mohamed Nadif,et al.  Non-negative Matrix Factorization Meets Word Embedding , 2017, SIGIR.

[14]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[15]  Mohamed Nadif,et al.  Unsupervised text mining for assessing and augmenting GWAS results , 2016, J. Biomed. Informatics.

[16]  Qingyu Chen,et al.  BioWordVec, improving biomedical word embeddings with subword information and MeSH , 2019, Scientific Data.

[17]  Ali Ghodsi,et al.  Fast Spectral Clustering Using Autoencoders and Landmarks , 2017, ICIAR.

[18]  Daniel Engel,et al.  A Survey of Dimension Reduction Methods for High-dimensional Data Analysis and Visualization , 2011, VLUDS.

[19]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[20]  Mohamed Nadif,et al.  Co-clustering for Binary and Categorical Data with Maximum Modularity , 2011, 2011 IEEE 11th International Conference on Data Mining.

[21]  Mohamed Nadif,et al.  Denoising Autoencoder as an Effective Dimensionality Reduction and Clustering of Text Data , 2017, PAKDD.

[22]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[23]  Mohamed Nadif,et al.  Spectral Clustering via Ensemble Deep Autoencoder Learning (SC-EDAE) , 2019, Pattern Recognit..

[24]  Mohamed Nadif,et al.  Sparse Poisson Latent Block Model for Document Clustering , 2017, IEEE Transactions on Knowledge and Data Engineering.

[25]  Vasudeva Varma,et al.  Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction mention extraction , 2017, BMC Bioinformatics.

[26]  Myriam Tami,et al.  An Overview of Deep Semi-Supervised Learning , 2020, ArXiv.

[27]  Mohamed Nadif,et al.  Model-based co-clustering for the effective handling of sparse data , 2017, Pattern Recognit..

[28]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[29]  Gérard Govaert,et al.  Mutual information, phi-squared and model-based co-clustering for contingency tables , 2018, Adv. Data Anal. Classif..

[30]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[31]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[32]  Brigitte Grau,et al.  A Study of Word Embeddings for Biomedical Question Answering , 2017 .

[33]  Shuigeng Zhou,et al.  DeepCluster: A General Clustering Framework Based on Deep Learning , 2017, ECML/PKDD.

[34]  Terrence Adam,et al.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[35]  Éric Gaussier,et al.  Deep k-Means: Jointly Clustering with k-Means and Learning Representations , 2018, Pattern Recognit. Lett..

[36]  Hao Wang,et al.  Few-Shot Named Entity Recognition via Meta-Learning , 2022, IEEE Transactions on Knowledge and Data Engineering.

[37]  Tong Zhang,et al.  Deep Subspace Clustering Networks , 2017, NIPS.

[38]  Lazhar Labiod,et al.  Efficient regularized spectral data embedding , 2020, Advances in Data Analysis and Classification.

[39]  Richard Socher,et al.  A Neural Network for Factoid Question Answering over Paragraphs , 2014, EMNLP.

[40]  Sampo Pyysalo,et al.  How to Train good Word Embeddings for Biomedical NLP , 2016, BioNLP@ACL.

[41]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[42]  Wei Zheng,et al.  Leveraging Biomedical Resources in Bi-LSTM for Drug-Drug Interaction Extraction , 2018, IEEE Access.

[43]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[44]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[45]  Mohamed Nadif,et al.  Co-clustering Document-term Matrices by Direct Maximization of Graph Modularity , 2015, CIKM.

[46]  Mohammed Alawad,et al.  Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[47]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[48]  Mohamed Nadif,et al.  Graph modularity maximization as an effective method for co-clustering text data , 2016, Knowl. Based Syst..

[49]  Sungzoon Cho,et al.  Variational Autoencoder based Anomaly Detection using Reconstruction Probability , 2015 .

[50]  Mohamed Nadif,et al.  Word Co-Occurrence Regularized Non-Negative Matrix Tri-Factorization for Text Data Co-Clustering , 2018, AAAI.

[51]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[52]  Anthony Rios,et al.  Quantifying 60 Years of Gender Bias in Biomedical Research with Word Embeddings , 2020, BIONLP.

[53]  Feiping Nie,et al.  Nonnegative Matrix Tri-factorization Based High-Order Co-clustering and Its Fast Implementation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[54]  Gérard Govaert,et al.  Co-Clustering: Models, Algorithms and Applications , 2013 .

[55]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[56]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[57]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[58]  Xiaodong Liu,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[59]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[60]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[61]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[62]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[63]  Ratnesh Sahay,et al.  Convolutional Embedded Networks for Population Scale Clustering and Bio-ancestry Inferencing. , 2020, IEEE/ACM transactions on computational biology and bioinformatics.

[64]  David Newman,et al.  External evaluation of topic models , 2009 .

[65]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[66]  Cheng Deng,et al.  Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[67]  Lazhar Labiod,et al.  Ensemble Block Co-clustering: A Unified Framework for Text Data , 2020, CIKM.

[68]  Yifan Peng,et al.  BioSentVec: creating sentence embeddings for biomedical texts , 2018, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[69]  Mohamed Nadif,et al.  Power Simultaneous Spectral Data Embedding and Clustering , 2016, SDM.

[70]  Wei Jin,et al.  Biomedical Semantic Embeddings: Using hybrid sentences to construct biomedical word embeddings and its applications , 2019, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[71]  Seungjin Choi,et al.  Orthogonal nonnegative matrix tri-factorization for co-clustering: Multiplicative updates on Stiefel manifolds , 2010, Inf. Process. Manag..

[72]  Maryam Habibi,et al.  Deep learning with word embeddings improves biomedical named entity recognition , 2017, Bioinform..

[73]  Hongfang Liu,et al.  A Comparison of Word Embeddings for the Biomedical Natural Language Processing , 2018, J. Biomed. Informatics.

[74]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[75]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[76]  Pengtao Xie,et al.  Effective Use of Bidirectional Language Modeling for Medical Named Entity Recognition , 2017, ArXiv.

[77]  Yonghwa Choi,et al.  A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining , 2019, IEEE Access.

[78]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[79]  Mohamed Nadif,et al.  Model-based von Mises-Fisher Co-clustering with a Conscience , 2017, SDM.

[80]  Michio Yamamoto,et al.  Clustering of functional data in a low-dimensional subspace , 2012, Advances in Data Analysis and Classification.

[81]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[82]  Mohamed Nadif,et al.  Handling the Impact of Low Frequency Events on Co-occurrence based Measures of Word Similarity - A Case Study of Pointwise Mutual Information , 2011, KDIR.

[83]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[84]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[85]  Yun Fu,et al.  Feature Selection Guided Auto-Encoder , 2017, AAAI.

[86]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Marcus Liwicki,et al.  PCA-Initialized Deep Neural Networks Applied to Document Image Analysis , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).