Identification of Alcohol Addicts Among High School Students Using Decision Tree Based Algorithm

The paper aims to apply a decision tree based machine learning algorithm to predict possible alcohol addicts among high school students. The data mining process is performed on the real-world data collected in two high schools in Portugal. The dataset is originally designed for the estimation of high school student’s performance where alcohol consumption is used as one of the parameters. In the implementation phase, KNIME analytics platform is applied to test the model. The significant part represents preprocessing of data where the new attributes are derived including class attribute labeled using alcohol addict matrix. Afterwards, the linear correlation is used to reduce the number of features. Data processing consists of dividing the dataset into training and test data, making artificial data for training phase and lastly analyzing the outputs of decision tree learner and predictor. Constructed decision tree determines the connections between certain attributes and student alcohol consumption. Finally, the overall accuracy of the model is measured using a confusion matrix.

[1]  Yanlong Hu Data Mining and Its Applications , 2012 .

[2]  N. Barnett,et al.  Adolescent alcohol use and injury. A summary and critical review of the literature. , 2004, Minerva pediatrica.

[3]  Lejla Gurbeta,et al.  An Expert Diagnostic System to Automatically Identify Asthma and Chronic Obstructive Pulmonary Disease in Clinical Settings , 2018, Scientific Reports.

[4]  Shraddha Dwivedi,et al.  Comprehensive study of data analytics tools (RapidMiner, Weka, R tool, Knime) , 2016, 2016 Symposium on Colossal Data Analysis and Networking (CDAN).

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Bruce Ratner The correlation coefficient: Its values range between +1/−1, or do they? , 2009 .

[7]  E. Marshall Adolescent alcohol use: risks and consequences. , 2014, Alcohol and alcoholism.

[8]  Ervin Sejdic,et al.  A telehealth system for automated diagnosis of asthma and chronical obstructive pulmonary disease , 2018, J. Am. Medical Informatics Assoc..

[9]  R. Beaglehole,et al.  Alcohol, cardiovascular diseases and all causes of death: a review of the epidemiological evidence. , 1992, Drug and alcohol review.

[10]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[11]  M. Grønbæk The positive and negative health effects of alcohol‐ and the public health implications , 2009 .

[12]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[13]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[14]  M. Sunderland,et al.  Machine‐learning prediction of adolescent alcohol use: a cross‐study, cross‐cultural validation , 2018, Addiction.

[15]  Lejla Gurbeta,et al.  Application of Neural Networks for classification of Patau, Edwards, Down, Turner and Klinefelter Syndrome based on first trimester maternal serum screening data, ultrasonographic findings and patient demographics , 2018, BMC Medical Genomics.

[16]  Aida Mustapha,et al.  Classification of Alcohol Consumption among Secondary School Students , 2017 .

[17]  Fabio Pagnotta,et al.  USING DATA MINING TO PREDICT SECONDARY SCHOOL STUDENT ALCOHOL CONSUMPTION , 2016 .