APPLICATION OF k- NEAREST NEIGHBOUR CLASSIFICATION IN MEDICAL DATA MINING IN THE CONTEXT OF KENYA

Medical data is an ever-growing source of information from hospitals in form of patient records. When mined, the information  hidden  in  these  records  is  a  huge  resource  bank  for  medical  research.  This  data  contains  hidden patterns  and  relationships,  which  can  lead  to  better  diagnosis.  Unfortunately,  discovery  of  these  patterns  and relationships often goes unexploited. Studies have been carried out in medical diagnosis to predict heart diseases, lungs  diseases,  and  various  tumors  based  on  the  past  data  collected  from  patients.  However,  they  are  mostly limited to domain-specific systems that predict diseases restricted to their area of operations.  In retrospect, the performance of the k-nearest neighborhoods (k-NN) classifier is highly dependent on the distance metric used to identify the k nearest neighbors of the query points. The standard Euclidean distance is commonly used in practice. This study uses vast storage of information so that diagnosis based on historical data can be made. It focuses on computing the probability of occurrence of a particular ailment by using a unique algorithm. This k-NN algorithm increases the accuracy of such diagnosis. The algorithm can be used to enhance the automated diagnoses, which include  diagnosis  of  multiple  diseases  showing  similar  symptoms.  To  validate  the  experimental  results,  a hypothesis  was  tested  for  the  following  variables:  accidents,  age,  allergies,  blood  pressure,  smoking  habit,  total cholesterol, diabetes and hypertension, family history of heart disease, obesity, and lack of physical activity. It was evident  that  there  was  a  strong  relationship  between  the  above  variables  to  the  causes  of  common  chronic diseases like: heart ailment, diabetes and cancer. Key words: k-NN, classification, algorithm

[1]  Tim Evans,et al.  Primary Health Care Now More Than Ever , 2008 .

[2]  Dongkyoo Shin,et al.  Effective Diagnosis of Heart Disease through Bagging Approach , 2009, 2009 2nd International Conference on Biomedical Engineering and Informatics.

[3]  Chanin Nantasenamat,et al.  Data mining of magnetocardiograms for prediction of ischemic heart disease , 2010, EXCLI journal.

[4]  M. Collen The origins of informatics. , 1999, M.D. computing : computers in medical practice.

[5]  M. Collen Origins of medical informatics. , 1986, The Western journal of medicine.

[6]  Latha Palaniappan,et al.  Risk functions for prediction of cardiovascular disease in elderly Australians: the Dubbo Study , 2003, The Medical journal of Australia.

[7]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[8]  R F Heller,et al.  How well can we predict coronary heart disease? Findings in the United Kingdom Heart Disease Prevention Project. , 1984, British medical journal.

[9]  T. Davis,et al.  Health Literacy and Cancer Communication , 2002, CA: a cancer journal for clinicians.

[10]  George D. Lundberg,et al.  The growth of medical information systems in the United States , 1979 .

[11]  Max Bramer,et al.  Principles of Data Mining , 2013, Undergraduate Topics in Computer Science.

[12]  H. Lehmann,et al.  Aspects of the Electronic Health Record Systems (2nd ed.) , 2006 .

[13]  Usama M. Fayyad,et al.  Knowledge Discovery in Databases: An Overview , 1997, ILP.

[14]  Abdulkadir Sengür,et al.  Effective diagnosis of heart disease through neural networks ensembles , 2009, Expert Syst. Appl..

[15]  M. Collen Clinical research databases—A historical review , 1990, Journal of Medical Systems.

[16]  Irenilza de Alencar Nääs,et al.  Improving detection of dairy cow estrus using fuzzy logic , 2010 .

[17]  Yingtao Jiang,et al.  Development of a decision support system for heart disease diagnosis using multilayer perceptron , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[18]  George Hripcsak,et al.  A Generalized Relational Schema for an Integrated Clinical Patient Database. , 1990 .

[19]  Ralph R. Grams The growth of medical information systems in the United States , 2005, Journal of Medical Systems.

[20]  Ronaldo Goldschmidt,et al.  Data Mining: um Guia Prático , 2005 .

[21]  Plamena Andreeva,et al.  Data Modelling and Specific Rule Generation via Data Mining Techniques , 2006 .

[22]  Fazle Rabbi,et al.  Statistical Analysis of Risk Factors for Cardiovascular Disease in Malakand Division , 2006 .

[23]  H. Cardoso Sample-specific (universal) metric approaches for determining the sex of immature human skeletal remains using permanent tooth dimensions , 2008 .

[24]  Marion J. Ball,et al.  The History of Medical Informatics in the United States , 2015 .

[25]  Bradley M. Hemminger,et al.  Scientific data repositories on the Web: An initial survey , 2010, J. Assoc. Inf. Sci. Technol..

[26]  Margaret Chan,et al.  Primary health care: Now more than ever , 2012 .

[27]  Ethem Alpaydin,et al.  Voting over Multiple Condensed Nearest Neighbors , 1997, Artificial Intelligence Review.

[28]  ci UniversityTR Voting over Multiple Condensed Nearest Neighbors , 1995 .

[29]  Harold P. Lehmann Aspects of electronic health record systems , 2006 .

[30]  Carolyn E. Begg,et al.  Database Systems: A Practical Approach to Design, Implementation and Management , 1998 .

[31]  L S Davis Prototype for future computer medical records. , 1970, Computers and biomedical research, an international journal.

[32]  Stephen M. Moore,et al.  Tools for Managing Image Flow in the Modality to Clinical-Image-Review Chain , 2003, Journal of Digital Imaging.

[33]  Optimizing total facial nerve patient management for effective clinical outcomes research. , 2014, JAMA facial plastic surgery.

[34]  Lina Shahwan-Akl Cardiovascular Disease Risk Factors among Adult Australian-Lebanese in Melbourne , 2010 .

[35]  L S Davis,et al.  A System Approach to Medical Information , 1973, Methods of Information in Medicine.

[36]  Nuria Lopez-Bigas,et al.  Gitools: Analysis and Visualisation of Genomic Data Using Interactive Heat-Maps , 2011, PloS one.