Using Visualization to Illustrate Machine Learning Models for Genomic Data

Massive amounts of genomic data are created for the advent of Next Generation Sequencing technologies. Visualizing these complex genomic data requires not only simply plotting of data but should also invite a decision or a choice. Machine learning has the ability to make prediction and aid in decision-making. Machine learning and visualization are both effective ways to deal with big data but focus on different purposes. Machine learning applies statistical learning techniques to automatically identify patterns in data to make highly accurate predictions while visualization can leverage the human perceptual system to interpret and uncover hidden patterns in big data. Clinicians, experts and researchers intend to use both visualization and machine learning to analyze their complex genomic data, but it is a serious challenge for them to understand and trust machine learning models in the medical industry. This paper overcomes this problem by combining intelligent and interactive visualization with machine learning models. Our prototype not only visualizes the complex genomics data in a meaningful 3D similarity space, but also illustrates the machine learning models and the real-time prediction results. Interactions and connections between the machine learning model and the 3D scatter plot are also developed and illustrated.

[1]  Ying LU,et al.  Decision tree methods: applications for classification and prediction , 2015, Shanghai archives of psychiatry.

[2]  Mou-Ze Liu,et al.  Application of Machine-Learning Models to Predict Tacrolimus Stable Dose in Renal Transplant Recipients , 2017, Scientific Reports.

[3]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[4]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[5]  Emily A. Stevens,et al.  Genomic medicine and targeted therapy for solid tumors , 2015, Journal of surgical oncology.

[6]  Mao Lin Huang,et al.  Interactive Visualization for Patient-to-Patient Comparison , 2014, Genomics & informatics.

[7]  Nils J. Nilsson,et al.  The Quest for Artificial Intelligence , 2009 .

[8]  Nicholas Ho,et al.  Visual Analytics of Clinical and Genetic Datasets of Acute Lymphoblastic Leukaemia , 2011, ICONIP.

[9]  Silvia Behnke,et al.  Subtype and prognostic classification of rhabdomyosarcoma by immunohistochemistry. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  J. Capra,et al.  Short DNA sequence patterns accurately identify broadly active human enhancers , 2017, BMC Genomics.

[11]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[12]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[13]  Kwan-Liu Ma,et al.  Machine Learning to Boost the Next Generation of Visualization Technology , 2007, IEEE Computer Graphics and Applications.

[14]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[15]  Nils J. Nilsson,et al.  The Quest For Artificial Intelligence: A History Of Ideas And Achievements , 2009 .

[16]  William Ribarsky,et al.  Visual analytics for complex concepts using a human cognition model , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[17]  Mohammed Erritali,et al.  A comparative study of decision tree ID3 and C4.5 , 2014 .

[18]  Liu Yuxun,et al.  Improved ID3 algorithm , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[19]  Raphael Fuchs,et al.  Visual Human+Machine Learning , 2009, IEEE Transactions on Visualization and Computer Graphics.

[20]  K. Jearanaitanakij Classifying Continuous Data Set by ID3 Algorithm , 2005, 2005 5th International Conference on Information Communications & Signal Processing.

[21]  Colin Ware,et al.  Chapter Seven – Space Perception , 2013 .