The ever-increasing volume of scientific literature calls for a better system to help researchers find relevant papers and summarize essential claims. Previous research has shown that a large portion of literature search queries are entity-set queries, that is, queries containing multiple entities of possibly different types. These queries reflect users’ need for finding documents that reveal inter-entity relationships, and pose non-trivial challenges to existing search systems that model each entity independently. In this project, we bring together a team of computing and biomedical experts, and develop SetSearch+, an entity-set-aware search and analysis system for scientific literature. SetSearch+ first leverages a data-driven text mining pipeline to extract typed entities for building entity-enhanced indices. Then, it adopts a novel entity-setaware ranking model for online document retrieval, which captures entity type information and relations among entity sets. Furthermore, it summarizes top-ranked documents into a concise, interpretable, and interactive concept graph, which enables a user to quickly grasp the gist of all documents and therefore accelerates the knowledge discovery process. Users can interact with the SetSearch+ system conveniently via a web-based interface. ACM Reference Format: Jiaming Shen1*, Jinfeng Xiao1*, Yu Zhang1, Carl Yang1, Jingbo Shang1, Jinda Han1, Saurabh Sinha1, Peipei Ping2, Richard Weinshilboum3, Zhiyong Lu4, Jiawei Han1 . 2018. SetSearch+: Entity-Set-Aware Search and Mining for Scientific Literature. In Proceedings of The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD’18). ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/nnnnnnn. nnnnnnn
[1]
Zhiyong Lu,et al.
PubTator: a web-based text mining tool for assisting biocuration
,
2013,
Nucleic Acids Res..
[2]
Jiawei Han,et al.
Mining Quality Phrases from Massive Text Corpora
,
2015,
SIGMOD Conference.
[3]
Clare R. Voss,et al.
ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
,
2015,
KDD.
[4]
Heng Ji,et al.
Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding
,
2016,
KDD.
[5]
Iryna Gurevych,et al.
Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps
,
2017,
EMNLP.
[6]
Pradeep Ravikumar,et al.
Ordinal Graphical Models: A Tale of Two Approaches
,
2017,
ICML.
[7]
Jiawei Han,et al.
Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning
,
2018
.
[8]
Jiawei Han,et al.
Automated Phrase Mining from Massive Text Corpora
,
2017,
IEEE Transactions on Knowledge and Data Engineering.
[9]
Jiawei Han,et al.
Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach
,
2018,
SIGIR.