论文信息 - PandaSearch: A fine-grained academic search engine for research documents

PandaSearch: A fine-grained academic search engine for research documents

In the world of academia, research documents enable the sharing and dissemination of scientific discoveries. During these “big data” times, academic search engines are widely used to find the relevant research documents. Considering the domain of computer science, a researcher often inputs a query with a specific goal to find an algorithm or a theorem. However, to this date, the return result of most search engines is just as a list of related papers. Users have to browse the results, download the interesting papers and look for the desired information, which is obviously laborious and inefficient. In this paper, we present a novel academic search system, called PandaSearch, that returns the results with a fine-grained interface, where the results are well organized by different categories, such as definitions, theorems, lemmas, algorithms and figures. The key technical challenges in our system include the automatic identification and extraction of different parts in a research document, the discovery of the main topic phrases for a definition or a theorem, and the recommendation of related definitions or figures to elegantly satisfy the search intention of users. Based on this, we have built a user friendly search interface for users to conveniently explore the documents, and find the relevant information.

Feiran Huang | Tok Wang Ling | Jiaheng Lu | Jia Li | Zhaoan Dong

[1] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[2] Xiaojun Wan,et al. Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[3] Ruoming Jin,et al. Fast and unified local search for random walk based k-nearest-neighbor query in large graphs , 2014, SIGMOD Conference.

[4] Cornelia Caragea,et al. Extracting Keyphrases from Research Papers Using Citation Networks , 2014, AAAI.

[5] Edwin Lughofer,et al. Single-pass active learning with conflict and ignorance , 2012, Evolving Systems.

[6] Divesh Srivastava,et al. A Dataset Search Engine for the Research Document Corpus , 2012, 2012 IEEE 28th International Conference on Data Engineering.