A deep fusion framework for unlabeled data-driven tumor recognition

Abstract Traditional pattern recognition problems are usually accomplished through two successive stages of representation and classification, the generalization ability and stability are difficult to guarantee for small samples and category imbalance. For tackling these problems, an unlabeled data-driven representation learning classification (RLC) fused model is constructed by integrating representation learning and classification into one model, rather than simple putting the two stages together. The RLC fused model mainly focuses on interactive iteratively optimizing representation learning and classification in a model, guiding and reinforcing each other. Under the framework of RLC, a deep nonnegative matrix factorization (NMF) is adopted for representation learning by complementing the advantages of NMF and deep learning, and avoiding complex network structure and parameter modulation. The framework is called deep NMF-RLC fusion model, which can achieve good performance for binary classification even the simplest linear regression classifier is used. The model explores useful information embedded in unlabeled data, and is suitable for small training samples and unbalanced classification. The performance of the proposed framework is verified on genetic-based tumor recognition, which contains all three stages of early diagnosis, tumor type recognition and postoperative metastasis. Experiments show that, compared with the published state-of-the-art methods and results, there are significant improvements in classification accuracy, specificity and sensitivity.

[1]  Zhiguo Jiang,et al.  Feature extraction from histopathological images based on nucleus-guided convolutional neural network for breast lesion classification , 2017, Pattern Recognit..

[2]  M. Hestenes Multiplier and gradient methods , 1969 .

[3]  Xiaoming Yuan,et al.  Recovering Low-Rank and Sparse Components of Matrices from Incomplete and Noisy Observations , 2011, SIAM J. Optim..

[4]  Yunmei Chen,et al.  An integrated inverse space sparse representation framework for tumor classification , 2018, Pattern Recognit..

[5]  Jar-Ferr Yang,et al.  Linear Discriminant Regression Classification for Face Recognition , 2013, IEEE Signal Processing Letters.

[6]  José Salvador Sánchez,et al.  Mapping microarray gene expression data into dissimilarity spaces for tumor classification , 2015, Inf. Sci..

[7]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[8]  Yingying Fan,et al.  INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION , 2015 .

[9]  Shuicheng Yan,et al.  A Unified Alternating Direction Method of Multipliers by Majorization Minimization , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  George Trigeorgis,et al.  A Deep Matrix Factorization Method for Learning Attribute Representations , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Simon C. K. Shiu,et al.  Metasample-Based Sparse Representation for Tumor Classification , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[14]  George C. Runger,et al.  Gene selection with guided regularized random forest , 2012, Pattern Recognit..

[15]  Zongben Xu,et al.  Model-driven deep-learning , 2018 .

[16]  Yan Cui,et al.  Transfer Learning for Molecular Cancer Classification Using Deep Neural Networks , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Chenlei Leng,et al.  Dynamic linear discriminant analysis in high dimensional space , 2017, Bernoulli.

[18]  C. Zheng,et al.  Metasample-Based Robust Sparse Representation for Tumor Classification , 2013 .

[19]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[20]  Sung-Bae Cho,et al.  Gene boosting for cancer classification based on gene expression profiles , 2009, Pattern Recognit..

[21]  Dinggang Shen,et al.  Multi-Channel 3D Deep Feature Learning for Survival Time Prediction of Brain Tumor Patients Using Multi-Modal Neuroimages , 2019, Scientific Reports.

[22]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[23]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[24]  Anthony J. Bagnall,et al.  Ensembles of Random Sphere Cover Classifiers , 2014, Pattern Recognit..

[25]  Chao Yang,et al.  A Survey on Deep Transfer Learning , 2018, ICANN.

[26]  E. Elkin,et al.  Decision Curve Analysis: A Novel Method for Evaluating Prediction Models , 2006, Medical decision making : an international journal of the Society for Medical Decision Making.

[27]  Hamido Fujita,et al.  Inverse projection group sparse representation for tumor classification: A low rank variation dictionary approach , 2020, Knowl. Based Syst..

[28]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[29]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[31]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[32]  Yong Xu,et al.  RPCA-Based Tumor Classification Using Gene Expression Data , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Qiang Su,et al.  A Cancer Gene Selection Algorithm Based on the K-S Test and CFS , 2017, BioMed research international.

[34]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[35]  B. He,et al.  Alternating Direction Method with Self-Adaptive Penalty Parameters for Monotone Variational Inequalities , 2000 .

[36]  Jinzhu Jia,et al.  Main and Interaction Effects Selection for Quadratic Discriminant Analysis via Penalized Linear Regression , 2017, 1702.04570.

[37]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[38]  Congying Han,et al.  Fusion of front-end and back-end learning based on layer-by-layer data re-representation , 2019 .

[39]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.