Sports & Nutrition Data Science using Gradient Boosting Machines

Throughout1 recent years, a significant amount of research has dealt with how humans could beat time. According to the extracted outcome, the main weapons in this battle are balanced nutrition and systematic exercise. Outdoor running, ranging from short distances up to even marathons, is gaining popularity, boosting the clothing, shoes and even gadget industry as well. The current rapid development of technology leads to the supply of a vast amount of information that enables us to properly process valuable knowledge. Smart watches, wristbands, mobile phone applications, shoes and clothes sensors, etc. have provided practitioners the tools to collect numerous data regarding their fitness levels. The recent challenge data mining is about to serve, lies to the fact of how one can combine and analyze the abundance of data from daily activities, nutrition habits and fitness activities into one, unified model that could explain the underlying patterns of a subject's physical capacity. The current work deals with collecting heterogeneous fitness and nutrition facts from students of various athletic background and skills within the University of the Aegean campus, using hi-tech GPS smartwatches and calories monitoring smartwatch applications. Upon data collection, we apply Gradient Boosting Machines to forecast the finishing time of two different running tasks, namely 800m and 5000m. The algorithm is trained and evaluated using different attribute sets, focusing on features that describe either fitness skills, nutrition habits or both. The results demonstrate an impressive ability of the aforementioned method to predict the finishing time which surpasses many other Machine Learning algorithms and traditional models such as Support Vector Machines, Random Forests, Deep Neural Networks and Linear Regression. Note that, based on current literature on the task at hand, prediction is almost exclusively relied to Linear Regression models, mentioning results far worse than GBM. Finally, feature importance functionalities of GBM are also exploited to reveal the factors that are considered to be most significant during the forecasting process. Such factors could be further utilized by professional or amateur athletes (with the assistant of professional trainers and experts of course), for improving their current fitness and health levels.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  L. Breiman CONSISTENCY FOR A SIMPLE MODEL OF RANDOM FORESTS , 2004 .

[3]  E. Giovannucci,et al.  Possible role of diet in cancer: systematic review and multiple meta‐analyses of dietary patterns, lifestyle factors, and cancer risk , 2017, Nutrition reviews.

[4]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[5]  R. Meeusen Exercise, Nutrition and the Brain , 2014, Sports Medicine.

[6]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[7]  K. Langa,et al.  Neuroprotective Diets Are Associated with Better Cognitive Function: The Health and Retirement Study , 2017, Journal of the American Geriatrics Society.

[8]  M. Leitzmann,et al.  Association between physical activity and mortality among breast cancer and colorectal cancer survivors: a systematic review and meta-analysis. , 2014, Annals of oncology : official journal of the European Society for Medical Oncology.

[9]  P. Nilsson Mediterranean diet and telomere length , 2014, BMJ : British Medical Journal.

[10]  Trevor Hastie,et al.  Boosting and Additive Trees , 2009 .

[11]  Judith Wylie-Rosett,et al.  Diet and Lifestyle Recommendations Revision 2006: A Scientific Statement From the American Heart Association Nutrition Committee , 2006, Circulation.

[12]  F. Hu,et al.  Mediterranean diet and telomere length in Nurses’ Health Study: population based cohort study , 2014, BMJ : British Medical Journal.

[13]  Zhi-Hua Zhou,et al.  On the doubt about margin explanation of boosting , 2010, Artif. Intell..

[14]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[15]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[16]  Chunlai Zhou,et al.  Belief functions on distributive lattices , 2012, Artif. Intell..

[17]  M. Martínez-González,et al.  Dietary patterns and total mortality in a Mediterranean cohort: the SUN project. , 2014, Journal of the Academy of Nutrition and Dietetics.

[18]  M. Martínez-González,et al.  Mediterranean diet, physical activity and their combined effect on all-cause mortality: The Seguimiento Universidad de Navarra (SUN) cohort. , 2018, Preventive medicine.