Discovering Relations by Entity Search in Lightweight Semantic Text Graphs

Entity search is becoming a popular alternative for full text search. Recently Google released its entity search based on confirmed, human-generated data such as Wikipedia. In spite of these developments, the task of entity discovery, search, or relation search in unstructured text remains a major challenge in the fields of information retrieval and information extraction. This paper tries to address that challenge, focusing specifically on entity relation discovery. This is achieved by processing unstructured text using simple information extraction methods, building lightweight semantic graphs and reusing them for entity relation discovery by applying algorithms from graph theory. An important part is also user interaction with semantic graphs, which can significantly improve information extraction results and entity relation search. Entity relations can be discovered by various text mining methods, but the advantage of the presented method lies in the similarity between the lightweight semantics extracted from a text and the information networks available as structured data. Both graph structures have similar properties and similar relation discovery algorithms can be applied. In addition, we can benefit from the integration of such graph data. We provide both a relevance and performance evaluations of the approach and showcase it in several use case applications.

[1]  John Judge,et al.  Galaxy: IBM Ontological Network Miner , 2007, CSSW.

[2]  Akrivi Katifori,et al.  Spreading Activation over Ontology-Based Resources: from Personal Context to Web Scale Reasoning , 2010, Int. J. Semantic Comput..

[3]  Ladislav Hluchý,et al.  Benchmarking Traversal Operations over Graph Databases , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[4]  Ladislav Hluchý,et al.  Emails as graph: relation discovery in email archive , 2012, WWW.

[5]  Ladislav Hluchý,et al.  Ontea: Platform for Pattern Based Automated Semantic Annotation , 2009, Comput. Informatics.

[6]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[7]  Marcel Kvassay,et al.  Email Social Network Extraction and Search , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[8]  William W. Cohen,et al.  Graph Based Similarity Measures for Synonym Extraction from Parsed Text , 2012, TextGraphs@ACL.

[9]  Dragomir R. Radev,et al.  Extracting Signed Social Networks from Text , 2012, TextGraphs@ACL.

[10]  Kjetil Nørvåg,et al.  SGDB - Simple Graph Database Optimized for Activation Spreading Computation , 2010, DASFAA Workshops.

[11]  Marcel Kvassay,et al.  Email Analysis and Information Extraction for Enterprise Benefit , 2011, Comput. Informatics.

[12]  Pavol Návrat,et al.  Full Text Search Engine as Scalable k-Nearest Neighbor Recommendation System , 2010, IFIP AI.

[13]  Jian Su,et al.  Discovering Relations Between Named Entities from a Large Raw Corpus Using Tree Similarity-Based Clustering , 2005, IJCNLP.

[14]  L. Hluchy,et al.  Graph-based analysis of data from human behaviour simulations , 2012, 2012 IEEE 10th International Symposium on Applied Machine Intelligence and Informatics (SAMI).

[15]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[16]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Borislav Iordanov,et al.  HyperGraphDB: A Generalized Graph Database , 2010, WAIM Workshops.

[18]  Hamish Cunningham,et al.  Information Extraction, Automatic , 2006 .

[19]  Fabio Crestani,et al.  Application of Spreading Activation Techniques in Information Retrieval , 1997, Artificial Intelligence Review.

[20]  Michal Laclav DISTRIBUTED WEB-SCALE INFRASTRUCTURE FOR CRAWLING, INDEXING AND SEARCH WITH SEMANTIC SUPPORT , 2012 .

[21]  Kjetil Nørvåg,et al.  Learning to Find Interesting Connections in Wikipedia , 2010, 2010 12th International Asia-Pacific Web Conference.

[22]  Amit P. Sheth,et al.  SemRank: ranking complex relationship search results on the semantic web , 2005, WWW '05.

[23]  Marcel Kvassay,et al.  Lightweight Semantic Approach for Enterprise Search and Interoperability , 2012, INVIT.

[24]  Marcel Kvassay,et al.  Use of E-mail Social Networks for Enterprise Benefit , 2010 .

[25]  Seppo Törmä,et al.  TEXT GRAPHS: ACCURATE CONCEPT MAPPING WITH WELL-DEFINED MEANING , 2004 .

[26]  William W. Cohen,et al.  Adaptive graph walk based similarity measures in entity-relation graphs , 2008 .

[27]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[28]  Kalina Bontcheva,et al.  Text Processing with GATE , 2011 .

[29]  Bülent Yener,et al.  Graph Theoretic and Spectral Analysis of Enron Email Data , 2005, Comput. Math. Organ. Theory.

[30]  Leo Sauermann,et al.  Combining Fact and Document Retrieval with Spreading Activation for Semantic Desktop Search , 2008, ESWC.

[31]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[32]  Dragomir R. Radev,et al.  Book Review: Graph-Based Natural Language Processing and Information Retrieval by Rada Mihalcea and Dragomir Radev , 2011, CL.

[33]  Marcel Kvassay,et al.  Reconstructing Social Networks from Emails , 2010, DATESO.

[34]  Alexander Troussov,et al.  Spreading Activation Approach to Tag-aware Recommenders : Modeling Similarity on Multidimensional Networks , 2009 .

[35]  Gerard Salton,et al.  On the use of spreading activation methods in automatic information , 1988, SIGIR '88.

[36]  S. Strogatz Exploring complex networks , 2001, Nature.

[37]  Charu C. Aggarwal,et al.  Social Network Data Analytics , 2011 .