SetSearch + : Entity-Set-Aware Search and Mining for Scientific Literature

The ever-increasing volume of scientific literature calls for a better system to help researchers find relevant papers and summarize essential claims. Previous research has shown that a large portion of literature search queries are entity-set queries, that is, queries containing multiple entities of possibly different types. These queries reflect users’ need for finding documents that reveal inter-entity relationships, and pose non-trivial challenges to existing search systems that model each entity independently. In this project, we bring together a team of computing and biomedical experts, and develop SetSearch+, an entity-set-aware search and analysis system for scientific literature. SetSearch+ first leverages a data-driven text mining pipeline to extract typed entities for building entity-enhanced indices. Then, it adopts a novel entity-setaware ranking model for online document retrieval, which captures entity type information and relations among entity sets. Furthermore, it summarizes top-ranked documents into a concise, interpretable, and interactive concept graph, which enables a user to quickly grasp the gist of all documents and therefore accelerates the knowledge discovery process. Users can interact with the SetSearch+ system conveniently via a web-based interface. ACM Reference Format: Jiaming Shen1*, Jinfeng Xiao1*, Yu Zhang1, Carl Yang1, Jingbo Shang1, Jinda Han1, Saurabh Sinha1, Peipei Ping2, Richard Weinshilboum3, Zhiyong Lu4, Jiawei Han1 . 2018. SetSearch+: Entity-Set-Aware Search and Mining for Scientific Literature. In Proceedings of The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD’18). ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/nnnnnnn. nnnnnnn