Web Information Extraction for Finding Remedy Based on a Patient-Authored Text: A Study on Homeopathy

Automatic medical diagnosis and remedy finding is an active research area for decades. The increasing trend of finding health remedies through the internet emerged the necessity of research on the analysis of the patient-authored text. Focussed analysis of the patient-authored text can also help in automatic remedy finding. As the web contains a huge amount of medicine and diagnosis-related information, an intelligent system can extract the relevant information to provide a health remedy given a patient-authored text query. In this paper, we attempted to develop such a system. As the patients’ description of suffering plays a key role in homeopathy remedy finding, here we focussed on the homeopathy domain. As per the best of our knowledge, this is the first attempt in this domain. For the development, first, the patient-authored text is processed to identify the disease name and characteristic symptoms. Then a query is formed and a set of relevant web pages is retrieved. The retrieved pages are then processed in multiple levels to extract the medicine names. The appropriateness of the medicines is computed using a hybrid similarity scoring technique. The medicine having the highest similarity is suggested to the user. The system is tested using a set of real questions collected from various relevant websites. The evaluation results demonstrate that the system recommends a relevant remedy in 96.33% of cases.

[1]  M. Girolami,et al.  Analysis of free text in electronic health records for identification of cancer patient trajectories , 2017, Scientific Reports.

[2]  Hong Yu,et al.  AskHERMES: An online question answering system for complex clinical questions , 2011, J. Biomed. Informatics.

[3]  A D Vanker,et al.  An expert diagnostic program for dermatology. , 1984, Computers and biomedical research, an international journal.

[4]  Stan Matwin,et al.  Authorship Attribution in Health Forums , 2013, RANLP.

[5]  Paloma Martínez,et al.  Turning user generated health-related content into actionable knowledge through text analytics services , 2016, Comput. Ind..

[6]  George Hripcsak,et al.  Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians , 2007, J. Biomed. Informatics.

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Amit Mishra,et al.  A survey on question answering systems with classification , 2016, J. King Saud Univ. Comput. Inf. Sci..

[9]  Kai Zheng,et al.  Mining Consumer Health Vocabulary from Community-Generated Text , 2014, AMIA.

[10]  Avare Stewart,et al.  A transfer approach to detecting disease reporting events in blog social media , 2011, HT '11.

[11]  Sunghwan Sohn,et al.  Analysis of Cross-Institutional Medication Description Patterns in Clinical Narratives , 2013, Biomedical informatics insights.

[12]  M. Ghassemi,et al.  Predicting early psychiatric readmission with natural language processing of narrative discharge summaries , 2016, Translational psychiatry.

[13]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[14]  Rui Dai,et al.  Classifying medical relations in clinical text via convolutional neural networks , 2018, Artif. Intell. Medicine.

[15]  Jeffrey Heer,et al.  Identifying medical terms in patient-authored text: a crowdsourcing-based approach , 2013, J. Am. Medical Informatics Assoc..

[16]  Jure Leskovec,et al.  Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health , 2016, TACL.

[17]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[18]  Amit Prakash,et al.  Automatic labelling of important terms and phrases from medical discussions , 2017, 2017 Conference on Information and Communication Technology (CICT).

[19]  Chunxiao Xing,et al.  Domain Supervised Deep Learning Framework for Detecting Chinese Diabetes-Related Topics , 2018, DASFAA.

[20]  M. Ebell,et al.  Analysis of questions asked by family doctors regarding patient care , 1999, BMJ.

[21]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[22]  Stephen B. Johnson,et al.  Generic queries for meeting clinical information needs. , 1993, Bulletin of the Medical Library Association.

[23]  Sujin Kim,et al.  Content analysis of cancer blog posts. , 2009, Journal of the Medical Library Association : JMLA.

[24]  Jasjit S. Suri,et al.  Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text , 2018, Journal of Medical Systems.

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  R. Dobson,et al.  Characterisation of mental health conditions in social media using Informed Deep Learning , 2017, Scientific Reports.

[27]  Kotonari Aoki,et al.  Analysis of Patient Narratives in Disease Blogs on the Internet: An Exploratory Study of Social Pharmacovigilance , 2017, JMIR public health and surveillance.

[28]  Pierre Zweigenbaum,et al.  MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies , 2015, Inf. Process. Manag..

[29]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[30]  Randolph A. Miller,et al.  Review: Medical Diagnostic Decision Support Systems - Past, Present, And Future: A Threaded Bibliography and Brief Commentary , 1994, J. Am. Medical Informatics Assoc..

[31]  Quanyuan Wu,et al.  Unsupervised Medical Entity Recognition and Linking in Chinese Online Medical Text , 2018, Journal of healthcare engineering.

[32]  Luísa Coheur,et al.  From symbolic to sub-symbolic information in question classification , 2011, Artificial Intelligence Review.

[33]  Siddharth Patwardhan,et al.  WatsonPaths: Scenario-Based Question Answering and Inference over Unstructured Information , 2017, AI Mag..

[34]  Lisa Gualtieri,et al.  Cancer patient blogs: How patients, clinicians, and researchers learn from rich narratives of illness , 2013, Proceedings of the ITI 2013 35th International Conference on Information Technology Interfaces.

[35]  Hong Yu,et al.  A Natural Language Processing System That Links Medical Terms in Electronic Health Record Notes to Lay Definitions: System Development Using Physician Reviews , 2018, Journal of medical Internet research.

[36]  Zhiyong Lu,et al.  NCBI disease corpus: A resource for disease name recognition and concept normalization , 2014, J. Biomed. Informatics.

[37]  Edward H. Shortliffe,et al.  A model of inexact reasoning in medicine , 1990 .

[38]  Meng Wang,et al.  Disease Inference from Health-Related Questions via Sparse Deep Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[39]  Carlos Ordonez,et al.  Association rule discovery with the train and test approach for heart disease prediction , 2006, IEEE Transactions on Information Technology in Biomedicine.

[40]  Ida Sim,et al.  A taxonomic description of computer-based clinical decision support systems , 2006, J. Biomed. Informatics.

[41]  Adam Wright,et al.  An automated technique for identifying associations between medications, laboratory results and problems , 2010, J. Biomed. Informatics.

[42]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..