Ranking Pathology Data in the Absence of a Ground Truth

Pathology results play a critical role in medical decision making. A particular challenge is the large number of pathology results that doctors are presented with on a daily basis. Some form of pathology result prioritisation is therefore a necessity. However, there is no readily available training data that would support a traditional supervised learning approach. Thus some alternative solutions are needed. There are two approaches presented in this paper, anomaly-based unsupervised pathology prioritisation and proxy ground truth-based supervised pathology prioritisation. Two variations of each were considered. With respect to the first, point and time series based unsupervised anomaly prioritisation; and with respect to the second kNN and RNN proxy ground truth-based supervised prioritisation. To act as a focus, Urea and Electrolytes pathology testing was used. The reported evaluation indicated that the RNN proxy ground truth-based supervised pathology prioritisation method produced the best results.

[1]  Florian Skopik,et al.  System log clustering approaches for cyber security applications: A survey , 2020, Comput. Secur..

[2]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[3]  C. Peng,et al.  Do Seasons Have an Influence on the Incidence of Depression? The Use of an Internet Search Engine Query Data as a Proxy of Human Affect , 2010, PloS one.

[4]  Rosie Jones,et al.  Classification of proxy labeled examples for marketing segment generation , 2011, KDD.

[5]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[6]  Mirjana Ivanovic,et al.  Comparison of different weighting schemes for the kNN classifier on time-series data , 2016, Knowledge and Information Systems.

[7]  Jiong Jin,et al.  A comprehensive survey of anomaly detection techniques for high dimensional big data , 2020, Journal of Big Data.

[8]  A combined filtering search for DTW , 2017, 2017 2nd International Conference on Image, Vision and Computing (ICIVC).

[9]  Jinoh Kim,et al.  Unsupervised Labeling for Supervised Anomaly Detection in Enterprise and Cloud Networks , 2017, 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud).

[10]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[11]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[12]  Md. Rafiqul Islam,et al.  A survey of anomaly detection techniques in financial domain , 2016, Future Gener. Comput. Syst..

[13]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[15]  Lei Li,et al.  Handwriting and Gestures in the Air, Recognizing on the Fly , 2013 .

[16]  I. S. Sitanggang,et al.  Determination of Optimal Epsilon (Eps) Value on DBSCAN Algorithm to Clustering Data on Peatland Hotspots in Sumatra , 2016 .

[17]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[18]  Zain Anwar Ali,et al.  Hybrid Anomaly Detection by Using Clustering for Wireless Sensor Network , 2019, Wirel. Pers. Commun..

[19]  Vipin Kumar,et al.  Parallel and Distributed Computing for Cybersecurity , 2005, IEEE Distributed Syst. Online.

[20]  Chih-Ping Wei,et al.  Nearest-neighbor-based approach to time-series classification , 2012, Decis. Support Syst..

[21]  B. S. Harish,et al.  Anomaly based Intrusion Detection using Modified Fuzzy Clustering , 2017, Int. J. Interact. Multim. Artif. Intell..

[22]  Simon Fong,et al.  DBSCAN: Past, present and future , 2014, The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014).

[23]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[24]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[25]  Michael W. Boyce,et al.  Establishing Ground Truth on Pyschophysiological Models for Training Machine Learning Algorithms: Options for Ground Truth Proxies , 2017, HCI.

[26]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[27]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[28]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[29]  Randy C. Paffenroth,et al.  Anomaly Detection with Robust Deep Autoencoders , 2017, KDD.

[30]  Ugljesa Djuric,et al.  Unsupervised Machine Learning in Pathology: The Next Frontier. , 2020, Surgical pathology clinics.