SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search

The COVID-19 pandemic has sparked unprecedented mobilization of scientists, generating a deluge of papers that makes it hard for researchers to keep track and explore new directions. Search engines are designed for targeted queries, not for discovery of connections across a corpus. In this paper, we present SciSight, a system for exploratory search of COVID-19 research integrating two key capabilities: first, exploring associations between biomedical facets automatically extracted from papers (e.g., genes, drugs, diseases, patient outcomes); second, combining textual and network information to search and visualize groups of researchers and their ties. SciSight1 has so far served over 15K users with over 42K page views and 13% returns.

[1]  Christian Posse,et al.  IN-SPIRE InfoVis 2004 Contest Entry , 2004 .

[2]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[3]  Jonathan Furner,et al.  Scholarly communication and bibliometrics , 2005, Annu. Rev. Inf. Sci. Technol..

[4]  Dafna Shahaf,et al.  Scaling up analogical innovation with crowds and AI , 2019, Proceedings of the National Academy of Sciences.

[5]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[6]  Esther Landhuis,et al.  Scientific literature: Information overload , 2016, Nature.

[7]  Kyle Lo,et al.  S2ORC: The Semantic Scholar Open Research Corpus , 2020, ACL.

[8]  Jean-Daniel Fekete,et al.  20 Years of Four HCI Conferences: A Visual Exploration , 2007, Int. J. Hum. Comput. Interact..

[9]  Colin Renfrew,et al.  Phylogenetic network analysis of SARS-CoV-2 genomes , 2020, Proceedings of the National Academy of Sciences.

[10]  D. Falzarano,et al.  SARS and MERS: recent insights into emerging coronaviruses , 2016, Nature Reviews Microbiology.

[11]  Wenpu Xing,et al.  Weighted PageRank algorithm , 2004, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004..

[12]  Oren Etzioni,et al.  CORD-19: The Covid-19 Open Research Dataset , 2020, NLPCOVID19.

[13]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[14]  Carl T. Bergstrom,et al.  Men Set Their Own Cites High: Gender and Self-citation across Fields and over Time , 2016, ArXiv.

[15]  Joemon M. Jose,et al.  Playing Your Cards Right: The Effect of Entity Cards on Search Behaviour and Workload , 2016, CHIIR.

[16]  Carl T. Bergstrom,et al.  Finding Cultural Holes: How Structure and Culture Diverge in Networks of Scholarly Communication , 2014 .

[17]  Florian Heimerl,et al.  ColTop: Visual Topic-Based Analysis of Scientific Community Structure , 2017, 2017 International Symposium on Big Data Visual Analytics (BDVA).

[18]  Paul A. Fontelo,et al.  Utilization of the PICO framework to improve searching PubMed for clinical questions , 2007, BMC Medical Informatics Decis. Mak..

[19]  James Pustejovsky,et al.  Exploration and Discovery of the COVID-19 Literature through Semantic Visualization , 2020, NAACL.

[20]  Stefano Padilla,et al.  Visualising COVID-19 Research , 2020, ArXiv.

[21]  A. Phelan,et al.  Baricitinib as potential treatment for 2019-nCoV acute respiratory disease , 2020, The Lancet.

[22]  Xuemin Lin,et al.  Effective and efficient community search over large heterogeneous information networks , 2020, Proc. VLDB Endow..

[23]  Michael E. Bales,et al.  Bibliometric Visualization and Analysis Software: State of the Art, Workflows, and Best Practices , 2019 .

[24]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[25]  Christian Posse,et al.  IN-SPIRE InfoVis 2004 Contest Entry , 2004, IEEE Symposium on Information Visualization.

[26]  Mike Thelwall,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[27]  Aniket Kittur,et al.  Apolo: interactive large graph sensemaking by combining machine learning and visualization , 2011, KDD.

[28]  Yuxiao Dong,et al.  A Review of Microsoft Academic Services for Science of Science Studies , 2019, Front. Big Data.

[29]  Ludo Waltman,et al.  Software survey: VOSviewer, a computer program for bibliometric mapping , 2009, Scientometrics.

[30]  Marti A. Hearst Clustering versus faceted categories for information exploration , 2006, Commun. ACM.

[31]  Floris Goerlandt,et al.  The scientific literature on Coronaviruses, COVID-19 and its associated safety-related research dimensions: A scientometric analysis and scoping review , 2020, Safety Science.

[32]  Sung-Hee Kim,et al.  How do People Make Sense of Unfamiliar Visualizations?: A Grounded Model of Novice's Information Visualization Sensemaking , 2016, IEEE Transactions on Visualization and Computer Graphics.

[33]  Loet Leydesdorff,et al.  Network Structure, Self-Organization and the Growth of International Collaboration in Science.Research Policy, 34(10), 2005, 1608-1618. , 2005, 0911.4299.

[34]  John T. Stasko,et al.  Combining Computational Analyses and Interactive Visualization for Document Exploration and Sensemaking in Jigsaw , 2013, IEEE Transactions on Visualization and Computer Graphics.

[35]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[36]  Santo Fortunato,et al.  World citation and collaboration networks: uncovering the role of geography in science , 2012, Scientific Reports.

[37]  R. Burt Structural Holes and Good Ideas1 , 2004, American Journal of Sociology.

[38]  Peace Ossom Williamson,et al.  Exploring PubMed as a reliable resource for scholarly communications services , 2019, Journal of the Medical Library Association : JMLA.

[39]  O. Persson,et al.  How to use Bibexcel for various types of bibliometric analysis , 2009 .

[40]  Ed C. M. Noyons,et al.  Combining concept maps and bibliometric maps: First explorations , 2006, Scientometrics.

[41]  Chaomei Chen,et al.  CiteSpace II: Visualization and Knowledge Discovery in Bibliographic Databases , 2005, AMIA.

[42]  Marti A. Hearst Search User Interfaces , 2009 .

[43]  Silvio Lattanzi,et al.  Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters , 2017, KDD.

[44]  Stephen B. Johnson,et al.  Evaluation of a Prototype Search and Visualization System for Exploring Scientific Communities , 2009, AMIA.

[45]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[46]  Byron C. Wallace,et al.  Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision , 2016, J. Mach. Learn. Res..

[47]  Jeffrey Heer,et al.  Refinery: Visual Exploration of Large, Heterogeneous Networks through Associative Browsing , 2015, Comput. Graph. Forum.

[48]  Daniel Tunkelang,et al.  Faceted Search , 2009, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[49]  Dafna Shahaf,et al.  Accelerating Innovation Through Analogy Mining , 2017, KDD.

[50]  Nicholas C. Wu,et al.  A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV , 2020, Science.

[51]  Kevin Li,et al.  Faceted metadata for image search and browsing , 2003, CHI '03.

[52]  Jean-Daniel Fekete,et al.  NodeTrix: a Hybrid Visualization of Social Networks , 2007, IEEE Transactions on Visualization and Computer Graphics.

[53]  Jeffrey Brainard,et al.  Scientists are drowning in COVID-19 papers. Can new tools keep them afloat? , 2020 .

[54]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[55]  X. de Lamballerie,et al.  Of chloroquine and COVID-19 , 2020, Antiviral Research.

[56]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[57]  Olle Persson,et al.  Studying research collaboration using co-authorships , 1996, Scientometrics.

[58]  J. S. Katz,et al.  What is research collaboration , 1997 .

[59]  Zhiyong Lu,et al.  BioCreative V CDR task corpus: a resource for chemical disease relation extraction , 2016, Database J. Biol. Databases Curation.

[60]  G. Kroemer,et al.  Coronavirus infections: Epidemiological, clinical and immunological features and hypotheses , 2020, Cell stress.

[61]  M. Loevinsohn,et al.  The cost of a knowledge silo: a systematic re-review of water, sanitation and hygiene interventions , 2014, Health policy and planning.

[62]  Carl T. Bergstrom,et al.  The Role of Gender in Scholarly Authorship , 2012, PloS one.

[63]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..