A relative similarity based method for interactive patient risk prediction

This paper investigates the patient risk prediction problem in the context of active learning with relative similarities. Active learning has been extensively studied and successfully applied to solve real problems. The typical setting of active learning methods is to query absolute questions. In a medical application where the goal is to predict the risk of patients on certain disease using Electronic Health Records (EHR), the absolute questions take the form of “Will this patient suffer from Alzheimer’s later in his/her life?”, or “Are these two patients similar or not?”. Due to the excessive requirements of domain knowledge, such absolute questions are usually difficult to answer, even for experienced medical experts. In addition, the performance of absolute question focused active learning methods is less stable, since incorrect answers often occur which can be detrimental to the risk prediction model. In this paper, alternatively, we focus on designing relative questions that can be easily answered by domain experts. The proposed relative queries take the form of “Is patient A or patient B more similar to patient C?”, which can be answered by medical experts with more confidence. These questions poll relative information as opposed to absolute information, and even can be answered by non-experts in some cases. In this paper we propose an interactive patient risk prediction method, which actively queries medical experts with the relative similarity of patients. We explore our method on both benchmark and real clinic datasets, and make several interesting discoveries including that querying relative similarities is effective in patient risk prediction, and sometimes can even yield better prediction accuracy than asking for absolute questions.

[1]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[2]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[3]  Jimeng Sun,et al.  Patient Risk Prediction Model via Top-k Stability Selection , 2013, SDM.

[4]  Fei Wang,et al.  Composite distance metric integration by leveraging multiple experts' inputs and its application in patient similarity assessment , 2012, Stat. Anal. Data Min..

[5]  Raymond J. Mooney,et al.  Diverse ensembles for active learning , 2004, ICML.

[6]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[7]  Sethuraman Panchanathan,et al.  Batch Mode Active Sampling Based on Marginal Probability Distribution Matching , 2013, ACM Trans. Knowl. Discov. Data.

[8]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[9]  Craig A. Knoblock,et al.  Selective Sampling with Redundant Views , 2000, AAAI/IAAI.

[10]  Nitesh V. Chawla,et al.  Time to CARE: a collaborative engine for practical disease prediction , 2010, Data Mining and Knowledge Discovery.

[11]  Jason Roy,et al.  Prediction Modeling Using EHR Data: Challenges, Strategies, and a Comparison of Machine Learning Approaches , 2010, Medical care.

[12]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2008, IEEE Trans. Knowl. Data Eng..

[13]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[14]  Jun Wang,et al.  Fast Pairwise Query Selection for Large-Scale Active Learning to Rank , 2013, 2013 IEEE 13th International Conference on Data Mining.

[15]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[16]  Xia Wang,et al.  Actively learning to infer social ties , 2012, Data Mining and Knowledge Discovery.

[17]  Diane J. Cook,et al.  Ask me better questions: active learning queries based on rule induction , 2011, KDD.

[18]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[19]  Aristides Gionis,et al.  Estimating entity importance via counting set covers , 2012, KDD.

[20]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[21]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[22]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[23]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[24]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[25]  Fei Wang,et al.  Supervised patient similarity measure of heterogeneous patient records , 2012, SKDD.

[26]  Russell Greiner,et al.  Optimistic Active-Learning Using Mutual Information , 2007, IJCAI.

[27]  Jun Wang,et al.  Active Learning to Rank using Pairwise Supervision , 2013, SDM.

[28]  Jun Wang,et al.  Exploring Patient Risk Groups with Incomplete Knowledge , 2013, 2013 IEEE 13th International Conference on Data Mining.

[29]  Rina Panigrahy,et al.  An Improved Algorithm Finding Nearest Neighbor Using Kd-trees , 2008, LATIN.

[30]  Nebojsa Jojic,et al.  Active spectral clustering via iterative uncertainty reduction , 2012, KDD.

[31]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[32]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[33]  G. Niklas Norén,et al.  Temporal pattern discovery in longitudinal electronic patient records , 2010, Data Mining and Knowledge Discovery.

[34]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[35]  Hua Xu,et al.  Applying active learning to high-throughput phenotyping algorithms for electronic health records data. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[36]  Michael R. Berthold,et al.  Active learning for object classification: from exploration to exploitation , 2009, Data Mining and Knowledge Discovery.

[37]  Eric Horvitz,et al.  Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning , 2007, IJCAI.