Machine Learning and Knowledge Extraction in Digital Pathology Needs an Integrative Approach

During the last decade pathology has benefited from the rapid progress of image digitizing technologies, which led to the development of scanners, capable to produce so-called Whole Slide images (WSI) which can be explored by a pathologist on a computer screen comparable to the conventional microscope and can be used for diagnostics, research, archiving and also education and training. Digital pathology is not just the transformation of the classical microscopic analysis of histological slides by pathologists to just a digital visualization. It is a disruptive innovation that will dramatically change medical work-flows in the coming years and help to foster personalized medicine. Really powerful gets a pathologist if she/he is augmented by machine learning, e.g. by support vector machines, random forests and deep learning. The ultimate benefit of digital pathology is to enable to learn, to extract knowledge and to make predictions from a combination of heterogenous data, i.e. the histological image, the patient history and the *omics data. These challenges call for integrated/integrative machine learning approach fostering transparency, trust, acceptance and the ability to explain step-by-step why a decision has been made.

[1]  Matthieu Geist,et al.  Human Activity Recognition Using Recurrent Neural Networks , 2017, CD-MAKE.

[2]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[3]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[4]  Andreas Holzinger,et al.  Biobanks - A Source of Large Biological Data Sets: Open Problems and Future Challenges , 2014, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics.

[5]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[6]  Camelia-Mihaela Pintea,et al.  A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop , 2017, Creative Mathematics and Informatics.

[7]  Elisa Ricci,et al.  Retinal Blood Vessel Segmentation Using Line Operators and Support Vector Classification , 2007, IEEE Transactions on Medical Imaging.

[8]  Matthew B. Blaschko,et al.  Learning Fully-Connected CRFs for Blood Vessel Segmentation in Retinal Images , 2014, MICCAI.

[9]  F. Demichelis,et al.  The virtual case: a new method to completely digitize cytological and histological slides , 2002, Virchows Archiv.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  Peter Szolovits,et al.  Automated de-identification of free-text medical records , 2008, BMC Medical Informatics Decis. Mak..

[12]  Yinhai Wang,et al.  Virtual microscopy and digital pathology in training and education , 2012, APMIS : acta pathologica, microbiologica, et immunologica Scandinavica.

[13]  Lionel Blanchet,et al.  Data Fusion in Metabolomics and Proteomics for Biomarker Discovery. , 2016, Methods in molecular biology.

[14]  Camelia-Mihaela Pintea,et al.  Towards interactive Machine Learning (iML): Applying Ant Colony Algorithms to Solve the Traveling Salesman Problem with the Human-in-the-Loop Approach , 2016, CD-ARES.

[15]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[16]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[17]  Kostas Pantazos,et al.  De-identifying an EHR Database - Anonymity, Correctness and Readability of the Medical Record , 2011, MIE.

[18]  LeighAnne Olsen,et al.  Clinical Data as the Basic Staple of Health Learning: Creating and Protecting a Public Good: Workshop Summary , 2011 .

[19]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[20]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[21]  S C Weller,et al.  Assessing Rater Performance without a "Gold Standard" Using Consensus Theory , 1997, Medical decision making : an international journal of the Society for Medical Decision Making.

[22]  Andreas Holzinger,et al.  Trends in Interactive Knowledge Discovery for Personalized Medicine: Cognitive Science meets Machine Learning , 2014, IEEE Intell. Informatics Bull..

[23]  Andreas Holzinger,et al.  Privacy Aware Machine Learning and the "Right to be Forgotten" , 2016, ERCIM News.

[24]  Andreas Holzinger,et al.  On Computationally-Enhanced Visual Analysis of Heterogeneous Data and Its Application in Biomedical Informatics , 2014, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics.

[25]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[26]  Doheon Lee,et al.  A Taxonomy of Dirty Data , 2004, Data Mining and Knowledge Discovery.

[27]  George Lee,et al.  Image analysis and machine learning in digital pathology: Challenges and opportunities , 2016, Medical Image Anal..

[28]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[29]  George Mavromatis,et al.  Biomedical Named Entity Recognition Using Neural Networks , 2017 .

[30]  Edgar R. Weippl,et al.  The Right to Be Forgotten: Towards Machine Learning on Perturbed Knowledge Bases , 2016, CD-ARES.

[31]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Andreas Holzinger,et al.  Interactive machine learning for health informatics: when do we need the human-in-the-loop? , 2016, Brain Informatics.

[33]  Martin Urschler,et al.  Automatic localization of locally similar structures based on the scale-widening random regression forest , 2016, 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI).

[34]  Igor Jurisica,et al.  Visual Data Mining: Effective Exploration of the Biological Universe , 2014, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics.

[35]  Andreas Holzinger,et al.  On Topological Data Mining , 2014, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics.

[36]  Ravi Varma Dandu,et al.  Storage media for computers in radiology , 2008, Indian Journal of Radiology and Imaging.

[37]  Edgar R. Weippl,et al.  Trees Cannot Lie: Using Data Structures for Forensics Purposes , 2011, 2011 European Intelligence and Security Informatics Conference.

[38]  Horst Bischof,et al.  Regressing Heatmaps for Multiple Landmark Localization Using CNNs , 2016, MICCAI.

[39]  Ahmed Sultan Al-Hegami,et al.  A Biomedical Named Entity Recognition Using Machine Learning Classifiers and Rich Feature Set , 2017 .

[40]  Andreas Holzinger,et al.  The More the Merrier - Federated Learning from Local Sphere Recommendations , 2017, CD-MAKE.

[41]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[42]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[43]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[44]  Stephen E. Fienberg,et al.  Scalable privacy-preserving data sharing methodology for genome-wide association studies , 2014, J. Biomed. Informatics.

[45]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[46]  Le Lu,et al.  Pancreas Segmentation in MRI Using Graph-Based Decision Fusion on Convolutional Neural Networks , 2016, MICCAI.

[47]  Su Ruan,et al.  Medical Image Synthesis with Context-Aware Generative Adversarial Networks , 2016, MICCAI.

[48]  Antonio Criminisi,et al.  Anatomy Detection and Localization in 3D Medical Images , 2013 .

[49]  Agostino Poggi,et al.  Multiagent Systems , 2006, Intelligenza Artificiale.

[50]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[51]  Philip S. Yu,et al.  On the Hardness of Graph Anonymization , 2011, 2011 IEEE 11th International Conference on Data Mining.

[52]  Alexander Heyl,et al.  The more, the merrier , 2010, Plant signaling & behavior.

[53]  Aleksey Boyko,et al.  Detecting Cancer Metastases on Gigapixel Pathology Images , 2017, ArXiv.

[54]  Ioannis Pitas,et al.  Segmentation of ultrasonic images using Support Vector Machines , 2003, Pattern Recognit. Lett..

[55]  Yu Hen Hu,et al.  Face de-identification using facial identity preserving features , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[56]  Jon Atli Benediktsson,et al.  Consensus theoretic classification methods , 1992, IEEE Trans. Syst. Man Cybern..

[57]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[58]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[59]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[60]  François Bousquet,et al.  Multi-agent systems in epidemiology: a first step for computational biology in the study of vector-borne disease transmission , 2008, BMC Bioinformatics.

[61]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[62]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[63]  Elena Console,et al.  Data Fusion , 2009, Encyclopedia of Database Systems.

[64]  Bonnie Berger,et al.  Enabling Privacy Preserving GWAS in Heterogeneous Human Populations , 2016, RECOMB.

[65]  Gerhard Weiss,et al.  Multiagent Systems , 1999 .

[66]  Xin Yan,et al.  Facilitating score and causal inference trees for large observational studies , 2012, J. Mach. Learn. Res..

[67]  Ronald R. Coifman,et al.  Data Fusion and Multicue Data Matching by Diffusion Maps , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[69]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[70]  A. Mobasheri,et al.  Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. , 2013, Omics : a journal of integrative biology.

[71]  Edgar R. Weippl,et al.  Protecting Anonymity in Data-Driven Biomedical Science , 2014, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics.

[72]  Andreas Holzinger,et al.  On the usage of health records for the design of virtual patients: a systematic review , 2013, BMC Medical Informatics and Decision Making.

[73]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[74]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[75]  Zoubin Ghahramani,et al.  Collaborative Gaussian Processes for Preference Learning , 2012, NIPS.

[76]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[77]  Nikolas P. Galatsanos,et al.  A support vector machine approach for detection of microcalcifications , 2002, IEEE Transactions on Medical Imaging.

[78]  Edgar R. Weippl,et al.  Using Internal MySQL/InnoDB B-Tree Index Navigation for Data Hiding , 2015, IFIP Int. Conf. Digital Forensics.

[79]  Ben Glocker,et al.  Decision Forests for Tissue-Specific Segmentation of High-Grade Gliomas in Multi-channel MR , 2012, MICCAI.

[80]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[81]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[82]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[83]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[84]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[85]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[86]  Stefano Forti,et al.  Digital Pathology: Science Fiction? , 2000, International journal of surgical pathology.

[87]  Ben Glocker,et al.  Uncertainty-Driven Forest Predictors for Vertebra Localization and Segmentation , 2015, MICCAI.

[88]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[89]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[90]  Edgar R. Weippl,et al.  An algorithm for collusion-resistant anonymization and fingerprinting of sensitive microdata , 2014, Electron. Mark..

[91]  M. Reeder,et al.  Gamuts in radiology: Comprehensive lists of roentgen differential diagnosis , 2013 .

[92]  Toby P. Breckon,et al.  The application of support vector machine classification to detect cell nuclei for automated microscopy , 2010, Machine Vision and Applications.

[93]  Andreas Holzinger,et al.  DO NOT DISTURB? Classifier Behavior on Perturbed Datasets , 2017, CD-MAKE.

[94]  N. Cox,et al.  On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy. , 2012, American journal of human genetics.

[95]  Stefan Bauer,et al.  Fully Automatic Segmentation of Brain Tumor Images Using Support Vector Machine Classification in Combination with Hierarchical Conditional Random Field Regularization , 2011, MICCAI.

[96]  Chris Mattmann,et al.  Computing: A vision for data science , 2013, Nature.

[97]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[98]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[99]  Sangwook Lee,et al.  Face-deidentification in images using Restricted Boltzmann Machines , 2016, 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST).

[100]  Tobias Schreck,et al.  Integrating Open Data on Cancer in Support to Tumor Growth Analysis , 2016, ITBAM.

[101]  Andreas Holzinger,et al.  Knowledge Discovery from Complex High Dimensional Data , 2016, Solving Large Scale Learning Tasks.

[102]  Joel H. Saltz,et al.  The virtual microscope , 2003, IEEE Transactions on Information Technology in Biomedicine.

[103]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[104]  Vincent Lepetit,et al.  On rendering synthetic images for training an object detector , 2014, Comput. Vis. Image Underst..

[105]  Bartha M Knoppers,et al.  Questioning the limits of genomic privacy. , 2012, American journal of human genetics.

[106]  Ben Glocker,et al.  Joint Classification-Regression Forests for Spatially Structured Multi-object Segmentation , 2012, ECCV.

[107]  Noboru Sonehara,et al.  An Algorithm for k-Anonymity-Based Fingerprinting , 2011, IWDW.

[108]  Martin Urschler,et al.  From Local to Global Random Regression Forests: Exploring Anatomical Landmark Localization , 2016, MICCAI.

[109]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[110]  Alina Campan,et al.  Data and Structural k-Anonymity in Social Networks , 2009, PinKDD.

[111]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[112]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[113]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[114]  B. Webb-Robertson,et al.  Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data. , 2013, Expert Opinion in Medical Diagnostics.

[115]  M. Degroot Reaching a Consensus , 1974 .