Enhancing the security of patients' portals and websites by detecting malicious web crawlers using machine learning techniques

INTRODUCTION There is increasing demand for access to medical information via patients' portals. However, one of the challenges towards widespread utilisation of such service is maintaining the security of those portals. Recent reports show an alarming increase in cyber-attacks using crawlers. These software programs crawl web pages and are capable of executing various commands such as attacking web servers, cracking passwords, harvesting users' personal information, and testing the vulnerability of servers. The aim of this research is to develop a new effective model for detecting malicious crawlers based on their navigational behavior using machine-learning techniques. METHOD In this research, different methods of crawler detection were investigated. Log files of a sample of compromised web sites were analysed and the best features for the detection of crawlers were extracted. Then after testing and comparing several machine learning algorithms including Support Vector Machine (SVM), Bayesian Network and Decision Tree, the best model was developed using the most appropriate features and its accuracy was evaluated. RESULTS Our analysis showed the SVM-based models can yield higher accuracy (f-measure = 0.97) comparing to Bayesian Network (f-measure = 0.88) and Decision Tree (f-measure = 0.95) and artificial neural network (ANN) (f-measure = 0.87)for detecting malicious crawlers. However, extracting proper features can increase the performance of the SVM (f-measure = 0.98), the Bayesian network (f-measure = 0.94) and the Decision Tree (f-measure = 0.96) and ANN (f-measure = 0.92). CONCLUSION Security concerns are among the potential barriers to widespread utilisation of patient portals. Machine learning algorithms can be accurately used to detect malicious crawlers and enhance the security of sensitive patients' information. Selecting appropriate features for the development of these algorithms can remarkably increase their accuracy.

[1]  Ming Dong,et al.  A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories , 2016, J. Biomed. Informatics.

[2]  Elizabeth S. Rodriguez Using Patient Portals to Increase Engagement in Patients with Cancer. , 2018, Seminars in oncology nursing.

[3]  N. Gordon,et al.  Older adults’ readiness to engage with eHealth patient education and self-care resources: a cross-sectional survey , 2016, BMC Health Services Research.

[4]  K. Powell Patient-Perceived Facilitators of and Barriers to Electronic Portal Use: A Systematic Review , 2017, Computers, informatics, nursing : CIN.

[5]  Om Prakash Vyas,et al.  Agglomerative Approach for Identification and Elimination of Web Robots from Web Server Logs to Extract Knowledge about Actual Visitors , 2015 .

[6]  Elske Ammenwerth,et al.  The Impact of Electronic Patient Portals on Patient Care: A Systematic Review of Controlled Trials , 2012, Journal of medical Internet research.

[7]  Young-Gab Kim,et al.  Web robot detection based on pattern-matching technique , 2012, J. Inf. Sci..

[8]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[9]  Marios D. Dikaiakos,et al.  Web robot detection: A probabilistic reasoning approach , 2009, Comput. Networks.

[10]  Lekha Bhambhu,et al.  DATA CLASSIFICATION USING SUPPORT VECTOR MACHINE , 2009 .

[11]  Derek Doran,et al.  A soft computing approach for benign and malicious web robot detection , 2017, Expert Syst. Appl..

[12]  Joseph W. Greene,et al.  Web robot detection in scholarly Open Access institutional repositories , 2016, Libr. Hi Tech.

[13]  David M Naeger,et al.  Patients Prefer Results From the Ordering Provider and Access to Their Radiology Reports. , 2015, Journal of the American College of Radiology : JACR.

[14]  Urmimala Sarkar,et al.  Barriers and Facilitators to Online Portal Use Among Patients and Caregivers in a Safety Net Health Care System: A Qualitative Study , 2015, Journal of medical Internet research.

[15]  E. Ammenwerth From eHealth to ePatient: The Role of Patient Portals in Fostering Patient Empowerment , 2018 .

[16]  Jodi G. Daniel,et al.  A national action plan to support consumer engagement via e-health. , 2013, Health affairs.

[17]  Yang Xie,et al.  Predictors and intensity of online access to electronic medical records among patients with cancer. , 2014, Journal of oncology practice.

[18]  Taya Irizarry,et al.  Patient Portals and Patient Engagement: A State of the Science Review , 2015, Journal of medical Internet research.

[19]  Rameria L. Stewart,et al.  Identifying Barriers that Affect Patients Access to their Patient Portals and MHealth Applications , 2018 .

[20]  Ilana Graetz,et al.  The Digital Divide and Patient Portals: Internet Access Explained Differences in Patient Portal Use for Secure Messaging by Age, Race, and Income , 2016, Medical care.

[21]  Joris van de Klundert,et al.  What do we know about developing patient portals? a systematic literature review , 2016, J. Am. Medical Informatics Assoc..

[22]  Joris van de Klundert,et al.  How outcomes are achieved through patient portals: a realist review , 2014, J. Am. Medical Informatics Assoc..

[23]  Guillermo Goldfarb,et al.  Technology-mediated communication with patients: WhatsApp Messenger, e-mail, patient portals. A challenge for pediatricians in the digital era. , 2018, Archivos argentinos de pediatria.

[24]  Aijun An,et al.  Feature evaluation for web crawler detection with data mining techniques , 2012, Expert Syst. Appl..

[25]  Ulrich Sax,et al.  Position Paper: Wireless Technology Infrastructures for Authentication of Patients: PKI that Rings , 2005, J. Am. Medical Informatics Assoc..

[26]  Jeffery L. Belden,et al.  Issues and questions to consider in implementing secure electronic patient-provider web portal communications systems , 2010, Int. J. Medical Informatics.

[27]  Swapna S. Gokhale,et al.  An integrated method for real time and offline web robot detection , 2016, Expert Syst. J. Knowl. Eng..

[28]  Sunyoung Kim,et al.  Communication matters: Exploring older adults' current use of patient portals , 2018, Int. J. Medical Informatics.

[29]  Grigorios Tsoumakas,et al.  Web Robot Detection: A Semantic Approach , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[30]  Neil M. Paige,et al.  Electronic Patient Portals: Evidence on Health Outcomes, Satisfaction, Efficiency, and Attitudes , 2013, Annals of Internal Medicine.

[31]  Gianluca De Leo,et al.  Patient web portals, disease management, and primary prevention , 2017, Risk management and healthcare policy.

[32]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[33]  Alex Talevski,et al.  Web Spambot Detection Based on Web Navigation Behaviour , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[34]  Hyungkyu Lee,et al.  Classification of web robots: An empirical study based on over one billion requests , 2009, Comput. Secur..

[35]  Dror G. Feitelson,et al.  Distinguishing humans from robots in web search logs: preliminary results using query rates and intervals , 2009, WSCD '09.