AKMiner: Domain-Specific Knowledge Graph Mining from Academic Literatures

Existing academic search systems like Google Scholar usually return a long list of scientific articles for a given research domain or topic (e.g. “document summarization”, “information extraction”), and users need to read volumes of articles to get some ideas of the research progress for a domain, which is very tedious and time-consuming. In this paper, we propose a novel system called AKMiner (Academic Knowledge Miner) to automatically mine useful knowledge from the articles in a specific domain, and then visually present the knowledge graph to users. Our system consists of two major components: a) the extraction module which extracts academic concepts and relations jointly based on Markov Logic Network, and b) the visualization module which generates knowledge graphs, including concept-cloud graphs and concept relation graphs. Experimental results demonstrate the effectiveness of each component of our proposed system.

[1]  Harold R. Robison Computer-detectable semantic structures , 1970, Inf. Storage Retr..

[2]  Manabu Okumura,et al.  Technical Trend Analysis by Analyzing Research Papers' Titles , 2009, LTC.

[3]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[4]  Lois L. Earl,et al.  Experiments in automatic extracting and indexing , 1970, Inf. Storage Retr..

[5]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[6]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[7]  Jiebo Luo,et al.  Discovery of social relationships in consumer photo collections using Markov Logic , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[9]  Edward A. Fox,et al.  Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[10]  Carolyn Penstein Rosé,et al.  SciSumm: A Multi-Document Summarization System for Scientific Articles , 2011, ACL.

[11]  Dafna Shahaf,et al.  Metro maps of science , 2012, KDD.

[12]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[13]  Xin Jiang,et al.  A ranking approach to keyphrase extraction , 2009, SIGIR.

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Min-Yen Kan,et al.  Scholarly paper recommendation via user's recent research interests , 2010, JCDL '10.

[16]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[17]  Carlos Guestrin,et al.  Beyond keyword search: discovering relevant scientific literature , 2011, KDD.

[18]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[19]  Graeme Hirst Human Language Technology , 2006 .

[20]  Hoifung Poon,et al.  Joint Inference for Knowledge Extraction from Biomedical Literature , 2010, NAACL.

[21]  Evangelos E. Milios,et al.  Multi-document summarization of scientific corpora , 2011, SAC.

[22]  C. Lee Giles,et al.  ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[23]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[24]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[25]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[26]  Dragomir R. Radev,et al.  Coherent Citation-Based Summarization of Scientific Papers , 2011, ACL.

[27]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[28]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[29]  Ben Shneiderman,et al.  Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization , 2012, J. Assoc. Inf. Sci. Technol..

[30]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[31]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[32]  H. Han,et al.  Automatic document meta-data extraction using support vector machines , 2003 .

[33]  Na Li,et al.  oreChem ChemXSeer: a semantic digital library for chemistry , 2010, JCDL '10.

[34]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.