The Effect of Evidence Transfer on Latent Feature Relevance for Clustering

Evidence transfer for clustering is a deep learning method that manipulates the latent representations of an autoencoder according to external categorical evidence with the effect of improving a clustering outcome. Evidence transfer’s application on clustering is designed to be robust when introduced with a low quality of evidence, while increasing the effectiveness of the clustering accuracy during relevant corresponding evidence. We interpret the effects of evidence transfer on the latent representation of an autoencoder by comparing our method to the information bottleneck method. Information bottleneck is an optimisation problem of finding the best tradeoff between maximising the mutual information of data representations and a task outcome while at the same time being effective in compressing the original data source. We posit that the evidence transfer method has essentially the same objective regarding the latent representations produced by an autoencoder. We verify our hypothesis using information theoretic metrics from feature selection in order to perform an empirical analysis over the information that is carried through the bottleneck of the latent space. We use the relevance metric to compare the overall mutual information between the latent representations and the ground truth labels before and after their incremental manipulation, as well as, to study the effects of evidence transfer regarding the significance of each latent feature.

[1]  Kai Yang,et al.  SAFS: A deep feature selection approach for precision medicine , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Brian C. Ross Mutual Information between Discrete and Continuous Data Sets , 2014, PloS one.

[6]  Marco Cristani,et al.  Infinite Feature Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[8]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[9]  Chelsea Dobbins,et al.  Towards Clustering of Mobile and Smartwatch Accelerometer Data for Physical Activity Recognition , 2018, Informatics.

[10]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[11]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[12]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[13]  Anna Goldenberg,et al.  Dropout Feature Ranking for Deep Learning Models , 2017, ArXiv.

[14]  Umberto Castellani,et al.  Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Parham Moradi,et al.  Relevance-redundancy feature selection based on ant colony optimization , 2015, Pattern Recognit..

[16]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[17]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[18]  George D. Magoulas,et al.  Distillation of Deep Learning Ensembles as a Regularisation Method , 2018 .

[19]  T. Martin McGinnity,et al.  Deep-FS: A feature selection algorithm for Deep Boltzmann Machines , 2018, Neurocomputing.

[20]  Tiranee Achalakul,et al.  Deep Belief Networks with Feature Selection for Sentiment Classification , 2016, 2016 7th International Conference on Intelligent Systems, Modelling and Simulation (ISMS).

[21]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  Vangelis Karkaletsis,et al.  Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[25]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[26]  George D. Magoulas,et al.  Customised ensemble methodologies for deep learning: Boosted Residual Networks and related approaches , 2018, Neural Computing and Applications.

[27]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[28]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[29]  Andreas Dengel,et al.  Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks , 2017, ArXiv.

[30]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[31]  Bojan Cukic,et al.  Robust and interoperable fingerprint spoof detection via convolutional neural networks , 2016, 2016 IEEE Symposium on Technologies for Homeland Security (HST).

[32]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Wyeth W. Wasserman,et al.  Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.

[34]  Keith A. Ellis,et al.  Feature Selection and Comparison of Machine Learning Algorithms in Classification of Grazing and Rumination Behaviour in Sheep , 2018, Sensors.

[35]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[36]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[37]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[38]  Mohamed A. Ismail,et al.  Multi-level gene/MiRNA feature selection using deep belief nets and active learning , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[39]  Klaus-Robert Müller,et al.  Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models , 2017, ArXiv.

[40]  Tong Zhang,et al.  Deep Learning Based Feature Selection for Remote Sensing Scene Classification , 2015, IEEE Geoscience and Remote Sensing Letters.

[41]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[42]  Alexander A. Alemi,et al.  Uncertainty in the Variational Information Bottleneck , 2018, ArXiv.

[43]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.