Classification of Neuroblastoma Histopathological Images Using Machine Learning

Neuroblastoma is the most common cancer in young children accounting for over 15% of deaths in children due to cancer. Identification of the class of neuroblastoma is dependent on histopathological classification performed by pathologists which are considered the gold standard. However, due to the heterogeneous nature of neuroblast tumours, the human eye can miss critical visual features in histopathology. Hence, the use of computer-based models can assist pathologists in classification through mathematical analysis. There is no publicly available dataset containing neuroblastoma histopathological images. So, this study uses dataset gathered from The Tumour Bank at Kids Research at The Children’s Hospital at Westmead, which has been used in previous research. Previous work on this dataset has shown maximum accuracy of 84%. One main issue that previous research fails to address is the class imbalance problem that exists in the dataset as one class represents over 50% of the samples. This study explores a range of feature extraction and data undersampling and over-sampling techniques to improve classification accuracy. Using these methods, this study was able to achieve accuracy of over 90% in the dataset. Moreover, significant improvements observed in this study were in the minority classes where previous work failed to achieve high level of classification accuracy. In doing so, this study shows importance of effective management of available data for any application of machine learning.

[1]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[2]  Soheila Gheisari,et al.  Computer Aided Classification of Neuroblastoma Histological Images Using Scale Invariant Feature Transform with Feature Encoding , 2018, Diagnostics.

[3]  H. Caron,et al.  Neuroblastoma: biology, prognosis, and treatment. , 2008, Pediatric clinics of North America.

[4]  K K Matthay,et al.  Molecular biology of neuroblastoma. , 1999, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[6]  P M Panchal,et al.  A Comparison of SIFT and SURF , 2013 .

[7]  Paul J. Kennedy,et al.  Convolutional Deep Belief Network with Feature Encoding for Classification of Neuroblastoma Histological Images , 2018, Journal of pathology informatics.

[8]  J. Maris Recent advances in neuroblastoma. , 2010, The New England journal of medicine.

[9]  K K Matthay,et al.  The International Neuroblastoma Pathology Classification (the Shimada system) , 1999, Cancer.

[10]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[11]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[12]  Ladislav Lenc,et al.  A combined SIFT/SURF descriptor for automatic face recognition , 2013, Other Conferences.

[13]  Paul J. Kennedy,et al.  Patched Completed Local Binary Pattern is an Effective Method for Neuroblastoma Histological Image Classification , 2017, AusDM.

[14]  Matloob Khushi,et al.  Bioinformatic analysis of cis-regulatory interactions between progesterone and estrogen receptors in breast cancer , 2014, PeerJ.

[15]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[16]  D. Ramyachitra,et al.  Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification , 2015, Genomics data.

[17]  Matloob Khushi,et al.  Predicting High-Risk Prostate Cancer Using Machine Learning Methods , 2019, Data.

[18]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.