Prediction of secondary testosterone deficiency using machine learning: A comparative analysis of ensemble and base classifiers, probability calibration, and sampling strategies in a slightly imbalanced dataset

Abstract Testosterone is the most important male sex hormone, and its deficiency brings many physical and mental harms. Efficiently identifying individuals with low testosterone is crucial prior to starting proper treatment. However, routine monitoring of testosterone levels can be costly in many regions, resulting in an underreporting of cases, especially in developing countries. Moreover, there are few studies that employ machine learning (ML) in prognosticating testosterone deficiency. This research, therefore, aims to offer a coherent comparative analysis of machine learning methods that can predict testosterone deficiency without having patients undergo costly medical tests. In doing so, we seek to provide to the urological community a publicly available dataset ( https://github.com/osmarluiz/Testosterone-Deficiency-Dataset ) to increase research in this yet untapped field. For this analysis, we used ten base classifiers (optimized with grid search stratified K-fold Cross Validation); three ensemble methods; and eight sampling strategies to analyze a total of 3,397 patients. The analysis was based on six features (age; abdominal circumference; triglycerides; high-density lipoprotein; diabetes; and hypertension), all of which obtained by low-cost out-of-pocket exams. We compared the sampling strategies and the classifiers' performance on an independent test set using ranking (PR-AUC), probabilistic (Brier score), and threshold metrics. We found that: (1) within the ranking metrics, sampling strategies did not enhance results in this slightly imbalanced (4:1 ratio) dataset; (2) the ensemble classifier using weighted average presented the best performance; (3) the best base classifier was XGBoost; (4) calibration showed significant improvement for the sampling strategies and slight improvements for the no sampling strategy; (5) the McNemar's test presented statistically similar results among all classifiers; and (6) abdominal circumference (AC) had by far the highest feature importance, followed by triglycerides (TG). Age, on the other hand, showed very little significance in predicting testosterone deficiency.

[1]  Huadong Zhu,et al.  Clinical Performance Evaluation of VersaTrek 528 Blood Culture System in a Chinese Tertiary Hospital , 2018, Front. Microbiol..

[2]  I. Kohane,et al.  Big Data and Machine Learning in Health Care. , 2018, JAMA.

[3]  V. Shandilya,et al.  A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique , 2017 .

[4]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[5]  Chih-Fong Tsai,et al.  Applying machine learning techniques to the identification of late-onset hypogonadism in elderly men , 2016, SpringerPlus.

[6]  Kyung Sup Kwak,et al.  A Comprehensive Medical Decision–Support Framework Based on a Heterogeneous Ensemble Classifier for Diabetes Prediction , 2019, Electronics.

[7]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[8]  Leila Maghsoumi-Norouzabad,et al.  Male adiposity, sperm parameters and reproductive hormones: An updated systematic review and collaborative meta‐analysis , 2020, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[9]  E. Mannucci,et al.  Testosterone, cardiovascular disease and the metabolic syndrome. , 2011, Best practice & research. Clinical endocrinology & metabolism.

[10]  A. Lenzi,et al.  Testosterone and metabolic syndrome: a meta-analysis study. , 2011, The journal of sexual medicine.

[11]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[12]  H. Völzke,et al.  Inverse association between total testosterone concentrations, incident hypertension and blood pressure , 2011, The aging male : the official journal of the International Society for the Study of the Aging Male.

[13]  Xujuan Zhou,et al.  A new nested ensemble technique for automated diagnosis of breast cancer , 2020, Pattern Recognit. Lett..

[14]  A. de Leiva,et al.  Insulin secretion in patients with latent autoimmune diabetes (LADA): half way between type 1 and type 2 diabetes: action LADA 9 , 2015, BMC Endocrine Disorders.

[15]  Douglas G. Altman,et al.  Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) , 2015, Circulation.

[16]  D. Goulis,et al.  Measuring testosterone in women and men. , 2019, Maturitas.

[17]  Yvonne Vergouwe,et al.  A calibration hierarchy for risk models was defined: from utopia to empirical data. , 2016, Journal of clinical epidemiology.

[18]  Saharon Rosset,et al.  Model selection via the AUC , 2004, ICML.

[19]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[20]  N. Veronese,et al.  Associations between body mass index, waist circumference and erectile dysfunction: a systematic review and META-analysis , 2020, Reviews in Endocrine and Metabolic Disorders.

[21]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[22]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[23]  A. Aversa,et al.  The practical management of testosterone deficiency in men , 2015, Nature Reviews Urology.

[24]  Xiaoxue Li,et al.  The Diophantine Equation 8x + p y = z 2 , 2015, TheScientificWorldJournal.

[25]  High triglyceride to HDL cholesterol ratio is associated with low testosterone and sex hormone-binding globulin levels in Middle-aged and elderly men , 2018, The aging male : the official journal of the International Society for the Study of the Aging Male.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  G. Collodel,et al.  Sperm with fibrous sheath dysplasia and anomalies in head–neck junction: focus on centriole and centrin 1 , 2017, Andrologia.

[28]  T. Smith,et al.  AA2500 testosterone gel normalizes androgen levels in aging males with improvements in body composition and sexual function. , 2003, The Journal of clinical endocrinology and metabolism.

[29]  Tamer Abuhmed,et al.  Intensive Care Unit Mortality Prediction: An Improved Patient-Specific Stacking Ensemble Model , 2020, IEEE Access.

[30]  M. Cullen,et al.  Relationship between semen production and medical comorbidity. , 2015, Fertility and sterility.

[31]  A. Traish,et al.  The dark side of testosterone deficiency: I. Metabolic syndrome and erectile dysfunction. , 2008, Journal of andrology.

[32]  Dong-Qing Wei,et al.  PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method , 2018, Front. Microbiol..

[33]  Z. Mo,et al.  Cross-sectional and longitudinal associations between serum testosterone concentrations and hypertension: Results from the Fangchenggang Area Male Health and Examination Survey in China. , 2018, Clinica chimica acta; international journal of clinical chemistry.

[34]  A. Aversa,et al.  Testosterone supplementation and body composition: results from a meta-analysis of observational studies , 2016, Journal of Endocrinological Investigation.

[35]  E. Nieschlag,et al.  Hormone substitution in male hypogonadism , 2000, Molecular and Cellular Endocrinology.

[36]  José Francisco Martínez Trinidad,et al.  Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases , 2016, Neurocomputing.

[37]  S. Cho,et al.  Development and validation of an explainable artificial intelligence‐based decision‐supporting tool for prostate biopsy , 2020, BJU international.

[38]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[39]  H. Raff,et al.  Position statement: Utility, limitations, and pitfalls in measuring testosterone: an Endocrine Society position statement. , 2007, The Journal of clinical endocrinology and metabolism.

[40]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[41]  J. Hansen,et al.  Low serum testosterone in men is inversely associated with non-fasting serum triglycerides: the Tromsø study. , 2008, Nutrition, metabolism, and cardiovascular diseases : NMCD.

[42]  G. Forti,et al.  Body weight loss reverts obesity-associated hypogonadotropic hypogonadism: a systematic review and meta-analysis. , 2013, European journal of endocrinology.

[43]  S. Lamm,et al.  Obesity and Hypogonadism. , 2016, The Urologic clinics of North America.

[44]  Cristian Sirbu,et al.  Testosterone and Depression: Systematic Review and Meta-Analysis , 2009, Journal of psychiatric practice.

[45]  Saeed Jalili,et al.  Predicting metabolic syndrome using decision tree and support vector machine methods , 2016, ARYA atherosclerosis.

[46]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[47]  C. Krittanawong,et al.  Artificial Intelligence in Precision Cardiovascular Medicine. , 2017, Journal of the American College of Cardiology.

[48]  A. Traish,et al.  Long‐term treatment of hypogonadal men with testosterone produces substantial and sustained weight loss , 2013, Obesity.

[49]  A. Hofman,et al.  Low levels of endogenous androgens increase the risk of atherosclerosis in elderly men: the Rotterdam study. , 2002, The Journal of clinical endocrinology and metabolism.

[50]  Yan Jia,et al.  Distribution-Sensitive Unbalanced Data Oversampling Method for Medical Diagnosis , 2019, Journal of Medical Systems.

[51]  Jonathan H. Chen,et al.  Machine Learning and Prediction in Medicine - Beyond the Peak of Inflated Expectations. , 2017, The New England journal of medicine.

[52]  W. Garvey,et al.  Testosterone Therapy in Men With Hypogonadism Prevents Progression From Prediabetes to Type 2 Diabetes: Eight-Year Data From a Registry Study , 2019, Diabetes Care.

[53]  P. Dandona,et al.  A practical guide to male hypogonadism in the primary care setting , 2010, International journal of clinical practice.

[54]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[55]  J. Friedman Stochastic gradient boosting , 2002 .

[56]  Jacek M. Zurada,et al.  Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance , 2008, Neural Networks.

[57]  T. Jones,et al.  Testosterone and obesity , 2015, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[58]  Robert A. Kloner,et al.  Testosterone and the Cardiovascular System: A Comprehensive Review of the Clinical Literature , 2013, Journal of the American Heart Association.

[59]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[60]  A. Albert,et al.  Male hypogonadism. , 1955, Postgraduate medicine.

[61]  J. Kai,et al.  Can machine-learning improve cardiovascular risk prediction using routine clinical data? , 2017, PloS one.

[62]  J. Kaufman,et al.  Endogenous testosterone and cardiovascular disease in healthy men: a meta-analysis , 2010, Heart.

[63]  H. Völzke,et al.  Low serum testosterone levels are associated with increased risk of mortality in a population-based cohort of men aged 20-79. , 2010, European heart journal.

[64]  Shivajirao M. Jadhav,et al.  Comparative Analysis of Ensemble Classifier and Single Base Classifier in Medical Disease Diagnosis , 2019 .

[65]  Yongchao Gong,et al.  Mask Scoring R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  J. Anaissie,et al.  Testosterone Replacement Therapy and Components of the Metabolic Syndrome. , 2017, Sexual medicine reviews.

[67]  Abdulhamit Subasi,et al.  Breast cancer diagnosis using GA feature selection and Rotation Forest , 2015, Neural Computing and Applications.

[68]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[69]  D. Peters,et al.  Poverty and Access to Health Care in Developing Countries , 2008, Annals of the New York Academy of Sciences.

[70]  A. Araujo,et al.  Endogenous Testosterone and Mortality in Men: A Systematic Review and Meta-Analysis , 2011 .

[71]  Weiguo Fan,et al.  Review of Medical Decision Support and Machine-Learning Methods , 2019, Veterinary pathology.

[72]  M. Hanefeld,et al.  Remission of type 2 diabetes following long‐term treatment with injectable testosterone undecanoate in patients with hypogonadism and type 2 diabetes: 11‐year data from a real‐world registry study , 2020, Diabetes, obesity & metabolism.

[73]  F. Comhaire Hormone replacement therapy and longevity , 2016, Andrologia.

[74]  M. Grossmann,et al.  Testosterone deficiency in men with Type 2 diabetes: pathophysiology and treatment , 2020, Diabetic medicine : a journal of the British Diabetic Association.

[75]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[76]  C. Allan,et al.  Androgens and obesity , 2010, Current opinion in endocrinology, diabetes, and obesity.

[77]  R. Langer,et al.  Low concentrations of serum testosterone predict acute myocardial infarction in men with type 2 diabetes mellitus , 2015, BMC Endocrine Disorders.

[78]  I A Basheer,et al.  Artificial neural networks: fundamentals, computing, design, and application. , 2000, Journal of microbiological methods.

[79]  R. Shabsigh,et al.  Randomized study of testosterone gel as adjunctive therapy to sildenafil in hypogonadal men with erectile dysfunction who do not respond to sildenafil alone. , 2004, The Journal of urology.

[80]  H. White,et al.  Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. , 2001, Journal of clinical epidemiology.

[81]  Stephen Jarvis,et al.  Mining association rules for admission control and service differentiation in e‐commerce applications , 2018, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[82]  Y. Guan,et al.  Treatment Stratification of Patients with Metastatic Castration-Resistant Prostate Cancer by Machine Learning , 2019, iScience.

[83]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[84]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[85]  F. Montorsi,et al.  Infertility as a proxy of general male health: results of a cross-sectional survey. , 2015, Fertility and sterility.

[86]  C. Tzourio,et al.  Testosterone and All-Cause Mortality in Older Men: The Role of Metabolic Syndrome , 2018, Journal of the Endocrine Society.

[87]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[88]  S. Bhasin,et al.  The Efficacy and Adverse Events of Testosterone Replacement Therapy in Hypogonadal Men: A Systematic Review and Meta-Analysis of Randomized, Placebo-Controlled Trials , 2018, The Journal of clinical endocrinology and metabolism.

[89]  A. Salonia,et al.  Male Sexual and Reproductive Health-Does the Urologist Have a Role in Addressing Gender Inequality in Life Expectancy? , 2019, European urology focus.

[90]  S. Stewart,et al.  Trends From 1996 to 2007 in Incidence and Mortality Outcomes of Heart Failure After Acute Myocardial Infarction: A Population‐Based Study of 20 812 Patients With First Acute Myocardial Infarction in Western Australia , 2013, Journal of the American Heart Association.

[91]  Samuel A. Oluwadare,et al.  Credit card fraud detection using machine learning techniques: A comparative analysis , 2017, 2017 International Conference on Computing Networking and Informatics (ICCNI).

[92]  Matteo Conti,et al.  Serum Steroid Ratio Profiles in Prostate Cancer: A New Diagnostic Tool Toward a Personalized Medicine Approach , 2018, Front. Endocrinol..

[93]  Jin-an Zhang,et al.  Testosterone level and risk of type 2 diabetes in men: a systematic review and meta-analysis , 2017, Endocrine connections.

[94]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[95]  S. Basaria Male hypogonadism , 2014, The Lancet.

[96]  A. Yassin,et al.  Testosterone therapy in hypogonadal men results in sustained and clinically meaningful weight loss , 2013, Clinical obesity.

[97]  Ying Ju,et al.  Predicting Diabetes Mellitus With Machine Learning Techniques , 2018, Front. Genet..

[98]  E. Platz,et al.  Evaluation and Management of Testosterone Deficiency: AUA Guideline , 2018, The Journal of urology.

[99]  V. Locatelli,et al.  Testosterone a key factor in gender related metabolic syndrome , 2018, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[100]  Avdesh Mishra,et al.  StackDPPred: a stacking based prediction of DNA‐binding protein from sequence , 2018, Bioinform..

[101]  M. Budoff,et al.  Title Erratum : The Testosterone Trials : Seven coordinated trials of testosterone treatment in elderly men , 2014 .

[102]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[103]  G. Hackett Type 2 Diabetes and Testosterone Therapy , 2018, The world journal of men's health.

[104]  T. Jones,et al.  Review: Testosterone and the metabolic syndrome , 2010, Therapeutic advances in endocrinology and metabolism.

[105]  A. Isidori,et al.  Testosterone as Potential Effective Therapy in Treatment of Obesity in Men with Testosterone Deficiency: A Review , 2012, Current diabetes reviews.

[106]  I. Vlahavas,et al.  Machine Learning and Data Mining Methods in Diabetes Research , 2017, Computational and structural biotechnology journal.

[107]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[108]  Dinesh Kumar,et al.  Addressing class imbalance problem in medical diagnosis: A genetic algorithm approach , 2017, 2017 International Conference on Information, Communication, Instrumentation and Control (ICICIC).

[109]  M. Monteiro,et al.  Obesity and male hypogonadism: Tales of a vicious cycle , 2019, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[110]  M. Nieuwenhuijsen,et al.  BlueHealth: a study programme protocol for mapping and quantifying the potential benefits to public health and well-being from Europe’s blue spaces , 2017, BMJ Open.

[111]  M. Mamdani,et al.  Testosterone therapy in hypogonadal men: a systematic review and network meta-analysis , 2017, BMJ Open.

[112]  Ali Idri,et al.  A systematic map of medical data preprocessing in knowledge discovery , 2018, Comput. Methods Programs Biomed..

[113]  Apilak Worachartcheewan,et al.  Predicting Metabolic Syndrome Using the Random Forest Method , 2015, TheScientificWorldJournal.

[114]  Oliver Snow,et al.  Deep Learning Modeling of Androgen Receptor Responses to Prostate Cancer Therapies , 2020, International journal of molecular sciences.

[115]  Bayu Adhi Tama,et al.  Tree-based classifier ensembles for early detection method of diabetes: an exploratory study , 2019, Artificial Intelligence Review.

[116]  S. Joost,et al.  Editorial: The Least Cost Path From Landscape Genetics to Landscape Genomics: Challenges and Opportunities to Explore NGS Data in a Spatially Explicit Context , 2018, Front. Genet..

[117]  K. Channer,et al.  Testosterone deficiency is associated with increased risk of mortality and testosterone replacement improves survival in men with type 2 diabetes. , 2013, European journal of endocrinology.

[118]  Abdulhamit Subasi,et al.  Congestive heart failure detection using random forest classifier , 2016, Comput. Methods Programs Biomed..

[119]  Chung-Ho Hsieh,et al.  Novel solutions for an old disease: diagnosis of acute appendicitis with random forest, support vector machines, and artificial neural networks. , 2011, Surgery.

[120]  J. Nettleship,et al.  Waist circumference is superior to weight and BMI in predicting sexual symptoms, voiding symptoms and psychosomatic symptoms in men with hypogonadism and erectile dysfunction , 2017, Andrologia.

[121]  Dr. S. Vijayarani,et al.  KIDNEY DISEASE PREDICTION USING SVM AND ANN ALGORITHMS , 2015 .

[122]  Frederick C W Wu,et al.  Testosterone Therapy in Men With Hypogonadism: An Endocrine Society Clinical Practice Guideline. , 2018, The Journal of clinical endocrinology and metabolism.

[123]  Byron C. Wallace,et al.  Class Probability Estimates are Unreliable for Imbalanced Data (and How to Fix Them) , 2012, 2012 IEEE 12th International Conference on Data Mining.

[124]  G. De Pergola,et al.  The Role of Diet and Weight Loss in Improving Secondary Hypogonadism in Men with Obesity with or without Type 2 Diabetes Mellitus , 2019, Nutrients.

[125]  Federico Cabitza,et al.  Machine Learning in Orthopedics: A Literature Review , 2018, Front. Bioeng. Biotechnol..

[126]  Fabien Subtil,et al.  The precision--recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. , 2015, Journal of clinical epidemiology.

[127]  Jing Xia,et al.  Class Weights Random Forest Algorithm for Processing Class Imbalanced Medical Data , 2018, IEEE Access.

[128]  Jie Ma,et al.  A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. , 2019, Journal of clinical epidemiology.

[129]  Kusheng Wu,et al.  Association of total testosterone, free testosterone, bioavailable testosterone, sex hormone–binding globulin, and hypertension , 2019, Medicine.

[130]  Bin Yu,et al.  Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier , 2020, Comput. Biol. Medicine.

[131]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[132]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[133]  Danilo Bzdok,et al.  Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets , 2020, Nature Communications.

[134]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[135]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[136]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[137]  F. Cheater,et al.  Men and health help-seeking behaviour: literature review. , 2005, Journal of advanced nursing.

[138]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.