Diversity-aware retrieval of medical records

The medical ontologyis used to address ambiguities in a medical query.Multiple sub-queries are constructed from the original medical query.The medical record relevance and novelty are combined together for ranking.The empirical experiments demonstrate the effectiveness of the approach.A pilot study onitsreal-world application is given for better medical service. The widely adoption of Electronic Medical Records (EMRs) causes an explosive growth of the medical and clinical data. It makes the medical search technologies become critical to find useful patient information in the large medical dataset. However, the high quality medical search is a challenging task, in particular due to the inherent complexity and ambiguity of medical terminology. In this paper, by exploiting the uncertainty in ambiguous medical queries, we propose a novel semantic-based approach to achieve the diversity-aware retrieval of EMRs, i.e., both the relevance and novelty are considered for EMR ranking. With the support of medical domain ontologies, we first mine all the potential semantics (concepts and relations between them) from a user query and consume them to model the multiple query aspects. Then, we propose a novel diversification strategy, which considers not only the aspect importance but also the aspect similarity, to perform the diversity-aware EMR ranking. A real-world pilot study, which utilizes the proposed medical search approach to improve the second use of the EMRs, is reported. We believe that our experience can serve as an important reference for the development of similar applications in a medical data utilization and sharing environment.

[1]  Ann E. Kelley Sobel The Move Toward Electronic Health Records , 2012 .

[2]  Huimin Zhao,et al.  Enhancing electronic medical record retrieval through semantic query expansion , 2012, Inf. Syst. E Bus. Manag..

[3]  Doug B. Fridsma Electronic Health Records: The HHS Perspective , 2012, Computer.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Enrico Motta,et al.  Semantically enhanced Information Retrieval: An ontology-based approach , 2011, J. Web Semant..

[6]  Dennis McLeod,et al.  Retrieval effectiveness of an ontology-based model for information selection , 2004, The VLDB Journal.

[7]  R. Douglas Collins Algorithmic Diagnosis of Symptoms and Signs: Cost-Effective Approach , 1995 .

[8]  Isambo Karali,et al.  Semantic search in the World News domain using automatically extracted metadata files , 2012, Knowl. Based Syst..

[9]  Peter Fankhauser,et al.  DivQ: diversification for keyword search over structured databases , 2010, SIGIR.

[10]  Raymond Y. K. Lau,et al.  Toward a semantic granularity model for domain-specific information retrieval , 2011, TOIS.

[11]  Orkunt Sabuncu,et al.  An ontology-based retrieval system using semantic indexing , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[12]  William R. Hersh,et al.  Information Retrieval: A Health and Biomedical Perspective , 2002 .

[13]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[14]  Jianqiang Li,et al.  Exploiting semantic linkages among multiple sources for semantic information retrieval , 2014, Enterp. Inf. Syst..

[15]  Jianqiang Li,et al.  A Cooperative Co-learning Approach for Concept Detection in Documents , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[16]  Borja Sotomayor,et al.  Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses , 2014, J. Biomed. Informatics.

[17]  Craig MacDonald,et al.  Intent-aware search result diversification , 2011, SIGIR.

[18]  John D. Lafferty,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[19]  Craig MacDonald,et al.  Exploiting term dependence while handling negation in medical search , 2012, SIGIR '12.

[20]  Cristina Nita-Rotaru,et al.  A survey of attack and defense techniques for reputation systems , 2009, CSUR.

[21]  Gondy Leroy,et al.  Combining NLP with evidence-based methods to find text metrics related to perceived and actual text difficulty , 2012, IHI '12.

[22]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[23]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[24]  Aysu Betin Can,et al.  MedicoPort: A medical search engine for all , 2007, Comput. Methods Programs Biomed..

[25]  Qing Zeng-Treitler,et al.  Exploring and developing consumer health vocabularies. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[26]  Zhoujun Li,et al.  A Survival Modeling Approach to Biomedical Search Result Diversification Using Wikipedia , 2010, IEEE Transactions on Knowledge and Data Engineering.

[27]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[28]  Jianqiang Li,et al.  Large Scale Sequential Learning from Partially Labeled Data , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[29]  F. Lamberti,et al.  A Relation-Based Page Rank Algorithm for Semantic Web Search Engines , 2009, IEEE Transactions on Knowledge and Data Engineering.

[30]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[31]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[32]  William R. Hersh,et al.  Barriers to Retrieving Patient Information from Electronic Health Record Data: Failure Analysis from the TREC Medical Records Track , 2012, AMIA.

[33]  Ellen M. Voorhees The TREC Medical Records Track , 2013, BCB.

[34]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[35]  Adrian Mocan,et al.  RankIE: Document Retrieval on Ranked Entity Graphs , 2009, Proc. VLDB Endow..

[36]  Allan Hanbury Medical information retrieval: an instance of domain-specific search , 2012, SIGIR '12.

[37]  Jianqiang Li,et al.  A top-down approach for approximate data anonymisation , 2013, Enterp. Inf. Syst..

[38]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[39]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[40]  George C. Verghese,et al.  Graph similarity scoring and matching , 2008, Appl. Math. Lett..

[41]  J. Shane Culpepper,et al.  Efficient set intersection for inverted indexing , 2010, TOIS.

[42]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[43]  Hai Jin,et al.  Expanding Approach to Information Retrieval Using Semantic Similarity Analysis Based on WordNet and Wikipedia , 2012, Int. J. Softw. Eng. Knowl. Eng..

[44]  Jianqiang Li,et al.  A path-based approach for web page retrieval , 2011, World Wide Web.

[45]  Jianqiang Li,et al.  Repairing and reasoning with inconsistent and uncertain ontologies , 2012, Adv. Eng. Softw..

[46]  Rada Mihalcea,et al.  Using WordNet and Lexical Operators to Improve Internet Searches , 2000, IEEE Internet Comput..

[47]  Gang Luo Intelligent Output Interface for Intelligent Medical Search Engine , 2008, AAAI.

[48]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[49]  Pablo Castells,et al.  An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval , 2007, IEEE Transactions on Knowledge and Data Engineering.

[50]  Antonio Maria Rinaldi,et al.  An ontology-driven approach for semantic information retrieval on the Web , 2009, TOIT.

[51]  Chunqiang Tang,et al.  On iterative intelligent medical search , 2008, SIGIR '08.

[52]  Xiangji Huang,et al.  Exploiting semantics for improving clinical information retrieval , 2013, SIGIR.

[53]  Anthony N. Nguyen,et al.  Exploiting medical hierarchies for concept-based information retrieval , 2012, ADCS.

[54]  Evgeniy Gabrilovich,et al.  Concept-Based Information Retrieval Using Explicit Semantic Analysis , 2011, TOIS.

[55]  Jianqiang Li,et al.  A hybrid solution for privacy preserving medical data sharing in the cloud environment , 2015, Future Gener. Comput. Syst..

[56]  Jianqiang Li,et al.  PathRank: Web Page Retrieval with Navigation Path , 2009, ECIR.

[57]  Seema Bawa,et al.  A review of ranking approaches for semantic search on Web , 2014, Inf. Process. Manag..

[58]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[59]  Alfredo Cuzzocrea Innovative methods and algorithms for advanced data-intensive computing , 2014, Future Gener. Comput. Syst..

[60]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[61]  Been-Chian Chien,et al.  ONTOLOGY-BASED INFORMATION RETRIEVAL USING FUZZY CONCEPT DOCUMENTATION , 2010, Cybern. Syst..

[62]  Jianqiang Li,et al.  Exploiting semantic resources for large scale text categorization , 2012, Journal of Intelligent Information Systems.

[63]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[64]  Jianqiang Li,et al.  Semantic-Based Composite Document Ranking , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[65]  Jianqiang Li,et al.  Fully Automatic Text Categorization by Exploiting WordNet , 2009, AIRS.

[66]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .