Machine Learning Classification of Females Susceptibility to Visceral Fat Associated Diseases

The problem of classifying subjects into risk categories is a common challenge in medical research. Machine Learning (ML) methods are widely used in the areas of risk prediction and classification. The primary objective of these algorithms is to predict dichotomous responses (e.g. healthy/at risk) based on several features. Similarly to statistical inference models, also ML models are subject to the common problem of class imbalance. Therefore, they are affected by the majority class increasing the false negative rate.

[1]  Thora Jonsdottir,et al.  The feasibility of constructing a Predictive Outcome Model for breast cancer using the tools of data mining , 2008, Expert Syst. Appl..

[2]  Z. Younossi,et al.  Treatment Strategies for Nonalcoholic Fatty Liver Disease and Nonalcoholic Steatohepatitis. , 2017, Clinics in liver disease.

[3]  Jitendra Agrawal,et al.  A New approach for Classification of Highly Imbalanced Datasets using Evolutionary Algorithms , 2011 .

[4]  A. Dattilo,et al.  Effects of weight reduction on blood lipids and lipoproteins: a meta-analysis. , 1992, The American journal of clinical nutrition.

[5]  Philippe Pibarot,et al.  Abdominal obesity and the metabolic syndrome: contribution to global cardiometabolic risk. , 2008, Arteriosclerosis, thrombosis, and vascular biology.

[6]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[7]  Mark Sanderson,et al.  Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press 2008. ISBN-13 978-0-521-86571-5, xxi + 482 pages , 2010, Natural Language Engineering.

[8]  John K. Jackman,et al.  A selective sampling method for imbalanced data learning on support vector machines , 2010 .

[9]  Glenn Fung,et al.  On the Dangers of Cross-Validation. An Experimental Evaluation , 2008, SDM.

[10]  Oguzhan Alagoz,et al.  Informatics in radiology: comparison of logistic regression and artificial neural network models in breast cancer risk estimation. , 2010, Radiographics : a review publication of the Radiological Society of North America, Inc.

[11]  P. Schrauwen,et al.  Effects of exercise training on intrahepatic lipid content in humans , 2016, Diabetologia.

[12]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  J. Tuomilehto,et al.  Long-term improvement in insulin sensitivity by changing lifestyles of people with impaired glucose tolerance: 4-year results from the Finnish Diabetes Prevention Study. , 2003, Diabetes.

[16]  Bernard Zenko,et al.  Is Combining Classifiers Better than Selecting the Best One , 2002, ICML.

[17]  Joe Faith,et al.  Predicting functional residues of protein sequence alignments as a feature selection task , 2011, Int. J. Data Min. Bioinform..

[18]  Mohamed Bekkar,et al.  Imbalanced Data Learning Approaches Review , 2013 .

[19]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[20]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[21]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[22]  M. Binks,et al.  Physical activity and obesity: what we know and what we need to know* , 2016, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[23]  T. Mazzone,et al.  Adipose tissue changes in obesity and the impact on metabolic function. , 2014, Translational research : the journal of laboratory and clinical medicine.

[24]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[25]  Gary M. Weiss,et al.  Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? , 2007, DMIN.

[26]  Udo Hoffmann,et al.  Abdominal Visceral and Subcutaneous Adipose Tissue Compartments: Association With Metabolic Risk Factors in the Framingham Heart Study , 2007, Circulation.

[27]  Maia Angelova,et al.  Gene expression Targeted projection pursuit for visualizing gene expression data classifications , 2006 .

[28]  Jie Gu,et al.  Making Class Bias Useful: A Strategy of Learning from Imbalanced Data , 2007, IDEAL.

[29]  Ajinkya More,et al.  Survey of resampling techniques for improving classification performance in unbalanced datasets , 2016, ArXiv.

[30]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[31]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[32]  S. Gortmaker,et al.  Health and economic burden of the projected obesity trends in the USA and the UK , 2011, The Lancet.