Informed Attentive Predictors: A Generalisable Architecture for Prior Knowledge-Based Assisted Diagnosis of Cancers

Due to the high mortality of many cancers and their related diseases, the prediction and prognosis techniques of cancers are being extensively studied to assist doctors in making diagnoses. Many machine-learning-based cancer predictors have been put forward, but many of them have failed to become widely utilised due to some crucial problems. For example, most methods require too much training data, which is not always applicable to institutes, and the complicated genetic mutual effects of cancers are generally ignored in many proposed methods. Moreover, a majority of these assist models are actually not safe to use, as they are generally built on black-box machine learners that lack references from related field knowledge. We observe that few machine-learning-based cancer predictors are capable of employing prior knowledge (PrK) to mitigate these issues. Therefore, in this paper, we propose a generalisable informed machine learning architecture named the Informed Attentive Predictor (IAP) to make PrK available to the predictor’s decision-making phases and apply it to the field of cancer prediction. Specifically, we make several implementations of the IAP and evaluate its performance on six TCGA datasets to demonstrate the effectiveness of our architecture as an assist system framework for actual clinical usage. The experimental results show a noticeable improvement in IAP models on accuracies, f1-scores and recall rates compared to their non-IAP counterparts (i.e., basic predictors).

[1]  Abbas Toloie Eshlaghy,et al.  Using Three Machine Learning Techniques for Predicting Breast Cancer Recurrence , 2013 .

[2]  Kapil Juneja,et al.  An improved weighted decision tree approach for breast cancer prediction , 2020 .

[3]  Y. Hu,et al.  A comparison of neural network and fuzzy c-means methods in bladder cancer cell classification , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[4]  Ehsan Kazemi,et al.  Deep Convolutional Neural Networks Enable Discrimination of Heterogeneous Digital Pathology Images , 2017, bioRxiv.

[5]  T. Sunil Kumar,et al.  Residual learning based CNN for breast cancer histopathological image classification , 2020, Int. J. Imaging Syst. Technol..

[6]  Jinlong Wu,et al.  Physics-informed machine learning approach for reconstructing Reynolds stress modeling discrepancies based on DNS data , 2016, 1606.07987.

[7]  Peng Guan,et al.  Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method , 2009, Journal of experimental & clinical cancer research : CR.

[8]  K. Nakayama,et al.  A 23 gene–based molecular prognostic score precisely predicts overall survival of breast cancer patients , 2019, EBioMedicine.

[9]  Walker H. Land,et al.  A Latent Space Support Vector Machine (LSSVM) Model for Cancer Prognosis , 2014, Complex Adaptive Systems.

[10]  G. Wainrib,et al.  Deep learning-based classification of mesothelioma improves prediction of patient outcome , 2019, Nature Medicine.

[11]  Zhe Zhu,et al.  Deep learning analysis of breast MRIs for prediction of occult invasive disease in ductal carcinoma in situ , 2017, Comput. Biol. Medicine.

[12]  R. Simes,et al.  Treatment selection for cancer patients: application of statistical decision theory to the treatment of advanced ovarian cancer. , 1985, Journal of chronic diseases.

[13]  Linling Qiu,et al.  Gated Graph Attention Network for Cancer Prediction , 2021, Sensors.

[14]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[15]  Xueda Hu,et al.  Understanding the Genetic Mechanisms of Cancer Drug Resistance Using Genomic Approaches. , 2016, Trends in genetics : TIG.

[16]  Ting Chen,et al.  Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Shih-Chii Liu,et al.  Overcoming the vanishing gradient problem in plain recurrent networks , 2018, ArXiv.

[18]  A. Reilley,et al.  An investigation of the cause of death from cancer , 1980, Journal of surgical oncology.

[19]  John Rand,et al.  Using neural networks to diagnose cancer , 1991, Journal of Medical Systems.

[20]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[21]  P. Rajeswari,et al.  Human Liver Cancer Classification using Microarray Gene Expression Data , 2011 .

[22]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[23]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[24]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[25]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[26]  Xiangqian Guo,et al.  The Application of Deep Learning in Cancer Prognosis Prediction , 2020, Cancers.

[27]  Kishan G. Mehrotra,et al.  An improved algorithm for neural network classification of imbalanced training sets , 1993, IEEE Trans. Neural Networks.

[28]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[29]  Mehmet Fatih Akay,et al.  Support vector machines combined with feature selection for breast cancer diagnosis , 2009, Expert Syst. Appl..

[30]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[31]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[32]  Guy Nir,et al.  Deep Learning-Based Gleason Grading of Prostate Cancer From Histopathology Images—Role of Multiscale Decision Aggregation and Data Augmentation , 2020, IEEE Journal of Biomedical and Health Informatics.

[33]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[34]  N. S. Murthy,et al.  Review paper on research direction towards cancer prediction and prognosis using machine learning and deep learning models , 2021, Journal of Ambient Intelligence and Humanized Computing.

[35]  M. Krętowska Computational Intelligence in Survival Analysis , 2014 .

[36]  Maxat Kulmanov,et al.  DeepMOCCA: A pan-cancer prognostic model identifies personalized prognostic markers through graph attention and multi-omics data integration , 2021, bioRxiv.

[37]  Lourdes Duran-Lopez,et al.  PROMETEO: A CNN-Based Computer-Aided Diagnosis System for WSI Prostate Cancer Detection , 2020, IEEE Access.

[38]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[39]  Hyunjung Shin,et al.  Research and applications: Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data , 2013, J. Am. Medical Informatics Assoc..

[40]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[41]  J. Dev,et al.  A Classification Technique for Microarray Gene Expression Data using PSO-FLANN , 2012 .

[42]  Deepti,et al.  A Survey on Application of Machine Learning Algorithms in Cancer Prediction and Prognosis , 2020 .

[43]  Gustavo K. Rohde,et al.  SetSVM: An Approach to Set Classification in Nuclei-Based Cancer Detection , 2019, IEEE Journal of Biomedical and Health Informatics.

[44]  Wei Lu,et al.  Integrating Machine Learning with Human Knowledge , 2020, iScience.

[45]  Jianlin Su,et al.  GAN-QP: A Novel GAN Framework without Gradient Vanishing and Lipschitz Constraint , 2018, ArXiv.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Chaofeng Li,et al.  A deep survival analysis method based on ranking , 2019, Artif. Intell. Medicine.

[48]  Tom M. Mitchell,et al.  Does Machine Learning Really Work? , 1997, AI Mag..

[49]  Tom C. Freeman,et al.  Improved grading and survival prediction of human astrocytic brain tumors by artificial neural network analysis of gene expression microarray data , 2008, Molecular Cancer Therapeutics.

[50]  Holger Fröhlich,et al.  Prognostic gene signatures for patient stratification in breast cancer - accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions , 2012, BMC Bioinformatics.

[51]  Barnali Sahu,et al.  A Novel Feature Selection Algorithm using Particle Swarm Optimization for Cancer Microarray Data , 2012 .

[52]  Vitoantonio Bevilacqua,et al.  A Novel Multi-Objective Genetic Algorithm Approach to Artificial Neural Network Topology Optimisation: The Breast Cancer Classification Problem , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[53]  Guy Van den Broeck,et al.  A Semantic Loss Function for Deep Learning with Symbolic Knowledge , 2017, ICML.

[54]  Taghi M. Khoshgoftaar,et al.  Survey on deep learning with class imbalance , 2019, J. Big Data.

[55]  J. Listgarten,et al.  Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms , 2004, Clinical Cancer Research.

[56]  Hoo-Chang Shin,et al.  Hybrid Clustering and Logistic Regression for Multi-Modal Brain Tumor Segmentation , 2012 .

[57]  Trevor Hastie,et al.  Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions , 2020, Nature Machine Intelligence.

[58]  Piers A. Hemsley An outlook on protein S-acylation in plants: what are the next steps? , 2017, Journal of experimental botany.

[59]  Yorgos Goletsis,et al.  Multiparametric Decision Support System for the Prediction of Oral Cancer Reoccurrence , 2012, IEEE Transactions on Information Technology in Biomedicine.

[60]  Yudong Zhang,et al.  Improved Breast Cancer Classification Through Combining Graph Convolutional Network and Convolutional Neural Network , 2021, Inf. Process. Manag..

[61]  Sung-Bae Cho,et al.  Cancer classification using ensemble of neural networks with multiple significant gene subsets , 2007, Applied Intelligence.

[62]  Mark A. Anastasio,et al.  Treatment Outcome Prediction for Cancer Patients Based on Radiomics and Belief Function Theory , 2019, IEEE Transactions on Radiation and Plasma Medical Sciences.

[63]  Christian Bauckhage,et al.  Informed Machine Learning - Towards a Taxonomy of Explicit Integration of Knowledge into Machine Learning , 2019, ArXiv.