With dramatic increasing of scientific research papers, scientific paper mining systems have become more popular for efficient paper retrieval and analysis. However, existing keyword based search engines, language or topic model based mining systems cannot provide customized queries according to various user requirements. Hence, in this paper, we are motivated to propose a novel TAIL (Time-Author-Institute-Literature) model to capture the relationships among literature, authors, institutes and time stamps. Based on the TAIL model, we implement the Massive Scientific Paper Mining (MSPM) system and set up a B/S (Browser/Server) structure for web services. The evaluation results on large real data show that our MSPM system could deliver desirable mining results, providing valuable data supports for scientific research cooperations.
[1]
Thomas L. Griffiths,et al.
Probabilistic author-topic models for information discovery
,
2004,
KDD.
[2]
John D. Lafferty,et al.
A study of smoothing methods for language models applied to Ad Hoc information retrieval
,
2001,
SIGIR '01.
[3]
Andrew McCallum,et al.
Topics over time: a non-Markov continuous-time model of topical trends
,
2006,
KDD '06.
[4]
Michael I. Jordan,et al.
Latent Dirichlet Allocation
,
2001,
J. Mach. Learn. Res..
[5]
Bo Gao,et al.
PatentMiner: topic-driven patent analysis and mining
,
2012,
KDD.