Synthetic semi-supervised learning in imbalanced domains: Constructing a model for donor-recipient matching in liver transplantation

Liver transplantation is a promising and widely-accepted treatment for patients with terminal liver disease. However, transplantation is restricted by the lack of suitable donors, resulting in significant waiting list deaths. This paper proposes a novel donor-recipient allocation system that uses machine learning to predict graft survival after transplantation using a dataset comprised of donor-recipient pairs from the Kings College Hospital (United Kingdom). The main novelty of the system is that it tackles the imbalanced nature of the dataset by considering semi-supervised learning, analysing its potential for obtaining more robust and equitable models in liver transplantation. We propose two different sources of unsupervised data for this specific problem (recent transplants and virtual donor-recipient pairs) and two methods for using these data during model construction (a semi-supervised algorithm and a label propagation scheme). The virtual pairs and the label propagation method are shown to alleviate the imbalanced distribution. The results of our experiments show that the use of synthetic and real unsupervised information helps to improve and stabilise the performance of the model and leads to fairer decisions with respect to the use of only supervised data. Moreover, the best model is combined with the Model for End-stage Liver Disease score (MELD), which is at the moment the most popular assignation methodology worldwide. By doing this, our decision-support system considers both the compatibility of the donor and the recipient (by our prediction system) and the recipient severity (via the MELD score), supporting then the principles of fairness and benefit.

[1]  César Hervás-Martínez,et al.  Use of artificial intelligence as an innovative donor-recipient matching model for liver transplantation: results from a multicenter Spanish study. , 2014, Journal of hepatology.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[4]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[5]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[6]  C. Hervás-Martínez,et al.  An organ allocation system for liver transplantation based on ordinal regression , 2014, Appl. Soft Comput..

[7]  Jiawei Han,et al.  Semi-supervised Discriminant Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  R. Busuttil,et al.  The utility of marginal donors in liver transplantation , 2003, Liver transplantation : official publication of the American Association for the Study of Liver Diseases and the International Liver Transplantation Society.

[9]  Guodong Zhou,et al.  Semi-Supervised Learning for Imbalanced Sentiment Classification , 2011, IJCAI.

[10]  Iñaki Inza,et al.  Weak supervision and other non-standard classification problems: A taxonomy , 2016, Pattern Recognit. Lett..

[11]  J. Emond,et al.  Survival Outcomes Following Liver Transplantation (SOFT) Score: A Novel Method to Predict Patient Survival Following Liver Transplantation , 2008, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[12]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[13]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Jiong Li,et al.  Combating class imbalance problem in semi-supervised defect detection , 2011, 2011 International Conference on Computational Problem-Solving (ICCP).

[15]  Maher Maalouf,et al.  Weighted logistic regression for large-scale imbalanced and rare events data , 2014, Knowl. Based Syst..

[16]  César Hervás-Martínez,et al.  Multi-objective evolutionary algorithm for donor-recipient decision system in liver transplants , 2012, Eur. J. Oper. Res..

[17]  J. Briceño,et al.  A proposal for scoring marginal liver grafts , 2000, Transplant international : official journal of the European Society for Organ Transplantation.

[18]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[21]  J. Kalbfleisch,et al.  Survival Benefit‐Based Deceased‐Donor Liver Allocation , 2009, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[22]  W. Kim,et al.  The model for end‐stage liver disease (MELD) , 2007, Hepatology.

[23]  Chuan-Jun Su,et al.  JADE implemented mobile multi-agent based, distributed information platform for pervasive health care monitoring , 2011, Appl. Soft Comput..

[24]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[25]  Rosa Maria Valdovinos,et al.  The Imbalanced Training Sample Problem: Under or over Sampling? , 2004, SSPR/SPR.

[26]  Ligang Zhou,et al.  Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods , 2013, Knowl. Based Syst..

[27]  Philipp Dutkowski,et al.  Are There Better Guidelines for Allocation in Liver Transplantation?: A Novel Score Targeting Justice and Utility in the Model for End-Stage Liver Disease Era , 2011, Annals of surgery.

[28]  Pedro Antonio Gutiérrez,et al.  Oversampling the Minority Class in the Feature Space , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Hung-Chang Liao,et al.  The genetic algorithm for breast tumor diagnosis - The case of DNA viruses , 2009, Appl. Soft Comput..

[30]  Paolo Soda,et al.  A multi-objective optimisation approach for class imbalance learning , 2011, Pattern Recognit..

[31]  Pedro Antonio Gutiérrez,et al.  Sensitivity Versus Accuracy in Multiclass Problems Using Memetic Pareto Evolutionary Neural Networks , 2010, IEEE Transactions on Neural Networks.

[32]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[33]  J. Bragg-Gresham,et al.  Characteristics Associated with Liver Graft Failure: The Concept of a Donor Risk Index , 2006, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[34]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[35]  S. Sathiya Keerthi,et al.  Deterministic annealing for semi-supervised kernel machines , 2006, ICML.

[36]  Nicu Sebe,et al.  Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.