Scalable keyword-based information retrieval has dominated the search industry for decades. When performing a sophisticated intelligence search and analysis task, a user is challenged to pose a right query, read multiple retrieved articles, understand their major contents, discover more relevant terms, and iterate. This process is often ad hoc and in many cases, very challenging especially when researchers start to explore a field they are not familiar with. For tasks like summarizing research efforts in one area, an analyst needs to interact with a keyword-based search engine for a long time before a reasonable, comprehensive technical report can be written. In this work, we developed a network-based, unified search and navigation platform, called FTS (Faceted Taxonomy Construction and Search), to ease query development and facilitate intelligence exploration in a large text repository, focused on scientific publications. It leverages the newest phrase mining, concept embedding and deep learning techniques to automatically extract concept terms and link them in a taxonomy structure, which could facilitate many interesting downstream applications including summarization, trend analysis, document categorization and recommendation.
[1]
Haixun Wang,et al.
Automatic taxonomy construction from keywords
,
2012,
KDD.
[2]
Jiawei Han,et al.
Mining Quality Phrases from Massive Text Corpora
,
2015,
SIGMOD Conference.
[3]
Jiawei Han,et al.
SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble
,
2017,
ECML/PKDD.
[4]
Xifeng Yan,et al.
Unsupervised Neural Categorization for Scientific Publications
,
2018,
SDM.
[5]
Jiawei Han,et al.
Automated Phrase Mining from Massive Text Corpora
,
2017,
IEEE Transactions on Knowledge and Data Engineering.
[6]
Brian M. Sadler,et al.
TaxoGen: Constructing Topical Concept Taxonomy by Adaptive Term Embedding and Clustering
,
2018,
KDD 2018.
[7]
Brian M. Sadler,et al.
HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion
,
2018,
KDD.