Higgs Boson Discovery using Machine Learning Methods with Pyspark

Abstract Higgs Boson is an elementary particle that gives the mass to everything in the natural world. The discovery of the Higgs Boson is a major challenge for particle physics. This paper proposes to solve the Higgs Boson Classification Problem with four Machine Learning (ML) Methods, using the Pyspark environment: Logistic Regression (LR), Decision Tree (DT), Random Forest (RF) and Gradient Boosted Tree (GBT). We compare the accuracy and AUC metrics of those ML Methods. We use a large dataset as Higgs Boson, collected from public site UCI and Higgs dataset downloaded from Kaggle site, in the experimentation stage.