Dynamic Cluster-based Retrieval and Discovery for Biomedical Literature

Due to increased specialization and experimentation, the volume of biomedical literature is rapidly increasing, where the current modalities of search and retrieval system can no longer support effective and efficient knowledge discovery. Standard information retrieval systems such as PubMed make assumptions as to users' prior knowledge and expect them to formulate a proper query term for the information they are looking for. There exist user feedback mechanisms to help users reformulate their queries, which still assumes that users know how the search results could be effectively narrowed down by way of additional keywords and/or filters. As an alternative, we revisit the Scatter/Gather information retrieval paradigm. Specifically, we explore a real-time dynamic cluster-based document browsing approach in the biomedical domain, discuss the system architecture involving keyword discovery and dynamic clustering, and present a working prototype with a relevant use case in comparison with a standard ranking-based information retrieval system.

[1]  James Bailey,et al.  Adjusting for Chance Clustering Comparison Measures , 2015, J. Mach. Learn. Res..

[2]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[3]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[4]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[5]  Ryen W. White,et al.  Supporting Exploratory Search, Introduction, Special Issue, Communications of the ACM , 2006 .

[6]  Chirag Shah,et al.  Searching as Learning: Exploring Search Behavior and Learning Outcomes in Learning-related Tasks , 2018, CHIIR.

[7]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[8]  Javed Mostafa,et al.  Toward Exploratory Search in Biomedicine: Evaluating Document Clusters by MeSH as a Semantic Anchor , 2018, ArXiv.

[9]  Preben Hansen Recent advances on searching as learning : An introduction to the special issue , 2016 .

[10]  Wu He,et al.  Workshop Proposal on Knowledge Discovery from Digital Libraries , 2018, JCDL.

[11]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[12]  Monica M. C. Schraefel,et al.  mSpace: improving information access to multimedia domains with multimodal exploratory search , 2006, CACM.

[13]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[14]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[15]  Eduardo E. Veas,et al.  Supporting Exploratory Search with a Visual User-Driven Approach , 2017, ACM Trans. Interact. Intell. Syst..

[16]  Preben Hansen,et al.  Editorial: Recent advances on searching as learning: An introduction to the special issue , 2016, J. Inf. Sci..

[17]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[18]  Annaleen Vermeulen,et al.  Versatility of chemically synthesized guide RNAs for CRISPR-Cas9 genome editing. , 2016, Journal of biotechnology.

[19]  Dorota Glowacka,et al.  Interactive Intent Modeling for Exploratory Search , 2018, ACM Trans. Inf. Syst..

[20]  Yong-Sam Kim,et al.  Improving CRISPR Genome Editing by Engineering Guide RNAs. , 2019, Trends in biotechnology.

[21]  Gary Marchionini,et al.  Exploratory search , 2006, Commun. ACM.

[22]  Javed Mostafa,et al.  Filtering Medical Documents Using Automated and Human Classification Methods , 1998, J. Am. Soc. Inf. Sci..

[23]  Samuel Kaski,et al.  Interactive intent modeling , 2014, Commun. ACM.

[24]  Nurit Assia Batzir,et al.  Therapeutic Genome Editing and its Potential Enhancement through CRISPR Guide RNA and Cas9 Modifications. , 2017, Pediatric endocrinology reviews : PER.

[25]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[26]  Shahid Nazir,et al.  CRISPR/Cas9; A robust technology for producing genetically engineered plants. , 2018, Cellular and molecular biology.

[27]  Qi Zhou,et al.  Artificial sgRNAs engineered for genome editing with new Cas12b orthologs , 2019, Cell Discovery.

[28]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[29]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[30]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[31]  Weimao Ke,et al.  Interactive search result clustering: a study of user behavior and retrieval effectiveness , 2013, JCDL '13.

[32]  Mike Thelwall,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[33]  Matthew Banta,et al.  What do exploratory searchers look at in a faceted search interface? , 2009, JCDL '09.