Researching persons & organizations: AWAKE: From text to an entity-centric knowledge base

We describe a pilot experiment building a capability to automatically read documents, develop a knowledge base, support analytics, and visualize the information found. The capability allows someone researching a topic of interest of focus on analysis and synthesis rather than on reading. We show how information from multiple modalities (speech, text, structured databases) and multiple approaches (ontology driven and open information extraction) can be fused to create a resource about both previously known and novel entities. We describe an extensible framework for language understanding tools that allows for scalability, plug-and-play of alternative components, and incorporation of additional input streams, including video, images, and foreign language text.

[1]  Claire Cardie,et al.  Joint Extraction of Entities and Relations for Opinion Recognition , 2006, EMNLP.

[2]  Heng Ji,et al.  RPI-BLENDER TAC-KBP2013 Knowledge Base Population System , 2013, TAC.

[3]  Stephanie M. Strassel,et al.  Linguistic Resources for 2013 Knowledge Base Population Evaluations , 2012 .

[4]  Kalina Bontcheva,et al.  Developing reusable and robust language processing components for information systems using GATE , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[5]  Michi Henning,et al.  A new approach to object-oriented middleware , 2004, IEEE Internet Computing.

[6]  Dietrich Klakow,et al.  Effective Slot Filling Based on Shallow Distant Supervision Methods , 2014, TAC.

[7]  Proceedings of the Sixth Text Analysis Conference, TAC 2013, Gaithersburg, Maryland, USA, November 18-19, 2013 , 2013, TAC.

[8]  Jason Baldridge,et al.  Multidisciplinary Instruction with the Natural Language Toolkit , 2008 .

[9]  Weiwei Guo,et al.  Committed Belief Annotation and Tagging , 2009, Linguistic Annotation Workshop.

[10]  Ryan Gabbard,et al.  Extreme Extraction – Machine Reading in a Week , 2011, EMNLP.

[11]  M. Slee,et al.  Thrift : Scalable Cross-Language Services Implementation , 2022 .

[12]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[13]  Ying Wang,et al.  PRIS at Knowledge Base Population 2013 , 2013, TAC.

[14]  Mihai Surdeanu Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling and Temporal Slot Filling , 2013, TAC.

[15]  Mark T. Maybury Analysis of Multimodal Natural Language Content in Broadcast Video , 2011 .

[16]  Dietrich Klakow,et al.  Generalizing from Freebase and Patterns using Cluster-Based Distant Supervision for TAC KBP Slotfilling 2012 , 2012, TAC.

[17]  Eric Newcomer,et al.  Understanding Web Services: XML, WSDL, SOAP, and UDDI , 2002 .

[18]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[19]  Mark T. Maybury Multimedia Information Extraction: Advances in Video, Audio, and Imagery Analysis for Search, Data Mining, Surveillance and Authoring , 2012 .

[20]  Middleware Track A New Approach to Object-Oriented Middleware , 2004 .

[21]  Joseph Olive,et al.  Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation , 2011 .

[22]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[23]  Nina Wacholder,et al.  Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[24]  Timothy W. Finin,et al.  HLTCOE Participation at TAC 2013 , 2013, TAC.