LightGBM: An Effective miRNA Classification Method in Breast Cancer Patients

miRNAs are small noncoding RNA molecules, mainly responsible for post-transcriptional control of gene expressions. Machine learning is becoming more and more widely used in breast tumor classification and diagnosis. In this paper, we compared the performance of different machine learning methods, such as Random Forest (RF), eXtreme Gradient Boosting(XGBoost) and Light Gradient Boosting Machine(LightGBM), for miRNAs identification in breast cancer patients. The performance comparison of each algorithm was evaluated based on the accuracy and logistic loss and where LightGBM was found better performing in several aspects. hsa-mir-139 was found as an important target for the breast cancer classification. As a powerful tool, LightGBM can be used to identify and classify miRNA target in breast cancer.

[1]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Sanjay Ranka,et al.  CLOUDS: A Decision Tree Classifier for Large Datasets , 1998, KDD.

[4]  C. K. Chua,et al.  Computer-Aided Breast Cancer Detection Using Mammograms: A Review , 2013, IEEE Reviews in Biomedical Engineering.

[5]  Henrik Flyger,et al.  Differential expression of miR-139, miR-486 and miR-21 in breast cancer patients sub-classified according to lymph node status , 2014, Cellular Oncology.

[6]  Abbas Toloie Eshlaghy,et al.  Using Three Machine Learning Techniques for Predicting Breast Cancer Recurrence , 2013 .

[7]  Anusha Bharat,et al.  Using Machine Learning algorithms for breast cancer risk prediction and diagnosis , 2018, 2018 3rd International Conference on Circuits, Control, Communication and Computing (I4C).

[8]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[9]  Keerthana Krishnan,et al.  miR-139-5p is a regulator of metastatic pathways in breast cancer , 2013, RNA.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Ling Wang,et al.  High expression of miR-21 in triple-negative breast cancers was correlated with a poor prognosis and promoted tumor cell in vitro proliferation , 2014, Medical Oncology.

[12]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[13]  Ruoming Jin,et al.  Communication and Memory Efficient Parallel Decision Tree Construction , 2003, SDM.

[14]  Lubomir M. Hadjiiski,et al.  Characterization of mammographic masses based on level set segmentation with new image features and patient information. , 2007, Medical physics.

[15]  Miriam Seoane Santos,et al.  Predicting Breast Cancer Recurrence Using Machine Learning Techniques , 2016, ACM Comput. Surv..

[16]  Michael J Kerin,et al.  Dysregulated miR-183 inhibits migration in breast cancer cells , 2010, BMC Cancer.

[17]  Eun Sook Lee,et al.  Prognostic Implications of MicroRNA-21 Overexpression in Invasive Ductal Carcinomas of the Breast , 2011, Journal of breast cancer.

[18]  Tie-Yan Liu,et al.  A Communication-Efficient Parallel Algorithm for Decision Tree , 2016, NIPS.

[19]  Erik Valdemar Cuevas Jiménez,et al.  Advances and Applications of Optimised Algorithms in Image Processing , 2017, Intelligent Systems Reference Library.

[20]  Hui Zhang,et al.  MiR-183/-96/-182 cluster is up-regulated in most breast cancers and increases cell proliferation and migration , 2014, Breast Cancer Research.