Entity Network Extraction Based on Association Finding and Relation Extraction

One of the core aims of semantic search is to directly present users with information instead of lists of documents. Various entity-oriented tasks have been or are being considered, including entity search and related entity finding. In the context of digital libraries for computational humanities, we consider another task, network extraction: given an input entity and a document collection, extract related entities from the collection and present them as a network. We develop a combined approach for entity network extraction that consists of a co-occurrence-based approach to association finding and a machine learning-based approach to relation extraction. We evaluate our approach by comparing the results on a ground truth obtained using a pooling method.

[1]  Editors , 1986, Brain Research Bulletin.

[2]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[3]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[4]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[5]  Gottfried Vossen,et al.  The World Wide Web and Databases , 2001, Lecture Notes in Computer Science.

[6]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[7]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[8]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[9]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[10]  Jie Tang,et al.  Social Network Extraction of Academic Researchers , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[11]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[12]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[13]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[14]  Vanessa Murdock Ellen Voorhees and Donna Harman (eds): TREC Experiment and Evaluation in Information Retrieval , 2008, Information Retrieval.

[15]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[16]  Katja Markert,et al.  A Comparison of Windowless and Window-Based Computational Association Measures as Predictors of Syntagmatic Human Associations , 2009, EMNLP.

[17]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[18]  Kathleen McKeown,et al.  Extracting Social Networks from Literary Fiction , 2010, ACL.

[19]  Krisztian Balog,et al.  Overview of the TREC 2010 Entity Track , 2010, TREC.

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  M. de Rijke,et al.  Linking Archives Using Document Enrichment and Term Selection , 2011, TPDL.

[22]  Krisztian Balog,et al.  Overview of the TREC 2011 Entity Track , 2011, TREC.

[23]  Om P. Damani,et al.  Lexical Co-occurrence, Statistical Significance, and Word Association , 2011, EMNLP.

[24]  Gergei M. Farkas Essays on Elite Networks in Sweden : Power, social integration, and informal contacts among political elites , 2012 .

[25]  M. de Rijke,et al.  Expertise Retrieval , 2012, Found. Trends Inf. Retr..

[26]  Denilson Barbosa,et al.  Extracting information networks from the blogosphere , 2012, TWEB.

[27]  Kathleen Fitzpatrick Anne Burdick, Johanna Drucker, Peter Lunenfeld, Todd Presner, & Jeffrey Schnapp, Digital_Humanities , 2014 .