Exploration and Discovery of the COVID-19 Literature through Semantic Visualization

We are developing semantic visualization techniques in order to enhance exploration and enable discovery over large datasets of complex networks of relations. Semantic visualization is a method of enabling exploration and discovery over large datasets of complex networks by exploiting the semantics of the relations in them. This involves (i) NLP to extract named entities, relations and knowledge graphs from the original data; (ii) indexing the output and creating representations for all relevant entities and relations that can be visualized in many different ways, e.g., as tag clouds, heat maps, graphs, etc.; (iii) applying parameter reduction operations to the extracted relations, creating "relation containers", or functional entities that can also be visualized using the same methods, allowing the visualization of multiple relations, partial pathways, and exploration across multiple dimensions. Our hope is that this will enable the discovery of novel inferences over relations in complex data that otherwise would go unnoticed. We have applied this to analysis of the recently released CORD-19 dataset.

[1]  Gunhee Kim,et al.  SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization , 2017, ICML.

[2]  Sampo Pyysalo,et al.  Overview of the Cancer Genetics (CG) task of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[3]  Daniel A. Keim,et al.  NEREx: Named‐Entity Relationship Exploration in Multi‐Party Conversations , 2017, Comput. Graph. Forum.

[4]  D Mercatelli,et al.  Gene regulatory network inference resources: A practical overview. , 2020, Biochimica et biophysica acta. Gene regulatory mechanisms.

[5]  Timothy Baldwin,et al.  COVID-SEE: Scientific Evidence Explorer for COVID-19 Related Research , 2020, ArXiv.

[6]  Paul G. Constantine,et al.  Inverse regression for ridge recovery: a data-driven approach for parameter reduction in computer experiments , 2017, Statistics and computing.

[7]  Gustavo A. Salazar,et al.  PPI layouts: BioJS components for the display of Protein-Protein Interactions , 2014, F1000Research.

[8]  David S. Wishart,et al.  PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more , 2015, Nucleic Acids Res..

[9]  Daniel A. Keim,et al.  LTMA: Layered Topic Matching for the Comparative Exploration, Evaluation, and Refinement of Topic Modeling Results , 2018, 2018 International Symposium on Big Data Visual and Immersive Analytics (BDVA).

[10]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[11]  Eric Horvitz,et al.  SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search , 2020, EMNLP.

[12]  Christian Rohrdantz,et al.  Exploratory Text Analysis using Lexical Episode Plots , 2015, EuroVis.

[13]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[14]  Roded Sharan,et al.  To Embed or Not: Network Embedding as a Paradigm in Computational Biology , 2019, Front. Genet..

[15]  Benjamin M. Gyori,et al.  From word models to executable models of signaling networks using automated assembly , 2017, bioRxiv.

[16]  Mario Cannataro,et al.  Visualization of protein interaction networks: problems and solutions , 2013, BMC Bioinformatics.

[17]  Daniel A. Keim,et al.  Interactive Visual Analysis of Transcribed Multi-Party Discourse , 2017, ACL.

[18]  Marie-Francine Moens,et al.  Structured learning for spatial information extraction from biomedical text: bacteria biotopes , 2015, BMC Bioinformatics.

[19]  Daniel King,et al.  ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing , 2019, BioNLP@ACL.

[20]  Oren Etzioni,et al.  CORD-19: The Covid-19 Open Research Dataset , 2020, NLPCOVID19.

[21]  Clayton T. Morrison,et al.  Large-scale automated machine reading discovers new cancer-driving mechanisms , 2018, Database J. Biol. Databases Curation.

[22]  David D. McDonald An Efficient Chart-based Algorithm for Partial-Parsing of Unrestricted Texts , 1992, ANLP.

[23]  Ying Lin,et al.  A Joint Neural Model for Information Extraction with Global Features , 2020, ACL.