Autoencoders as Weight Initialization of Deep Classification Networks Applied to Papillary Thyroid Carcinoma

Cancer is one of the most serious health problems of our time. One approach for automatically classifying tumor samples is to analyze derived molecular information. Previous work by Teixeira et al. compared different methods of Data Oversampling and Feature Reduction, as well as Deep (Stacked) Denoising Autoencoders followed by a shallow layer for classification. In this work, we compare the performance of 6 different types of Autoencoder (AE), combined with two different approaches when training the classification model: (a) fixing the weights, after pre-training an AE, and (b) allowing fine-tuning of the entire network. We also apply two different strategies for embedding the AE into the classification network: (1) by only importing the encoding layers, and (2) by importing the complete AE. Our best result was the combination of unsupervised feature learning through a single-layer Denoising AE, followed by its complete import into the classification network, and subsequent fine-tuning through supervised training, achieving an ${F}_{1}$ score of 99.61% ±0.54. We conclude that a reconstruction of the input space, combined with a deeper classification network outperforms previous work, without resorting to data augmentation techniques.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Xinghua Shi,et al.  A deep auto-encoder model for gene expression prediction , 2017, BMC Genomics.

[3]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[4]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[5]  klaguia International Network of Cancer Genome Projects , 2010 .

[6]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[7]  Kimberly R. Kukurba,et al.  RNA Sequencing and Analysis. , 2015, Cold Spring Harbor protocols.

[8]  Rui Camacho,et al.  Learning influential genes on cancer gene expression data with stacked denoising autoencoders , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[10]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[11]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[12]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..