Knowledge Graph Anchored Information-Extraction for Domain-Specific Insights

The growing quantity and complexity of data pose challenges for humans to consume information and respond in a timely manner. For businesses in domains with rapidly changing rules and regulations, failure to identify changes can be costly. In contrast to expert analysis or the development of domain-specific ontology and taxonomies, we use a task-based approach for fulfilling specific information needs within a new domain. Specifically, we propose to extract task-based information from incoming instance data. A pipeline constructed of state of the art NLP technologies, including a bi-LSTM-CRF model for entity extraction, attention-based deep Semantic Role Labeling, and an automated verb-based relationship extractor, is used to automatically extract an instance level semantic structure. Each instance is then combined with a larger, domain-specific knowledge graph to produce new and timely insights. Preliminary results, validated manually, show the methodology to be effective for extracting specific information to complete end use-cases. Introduction: The sheer growth in unstructured content is overwhelming the ability of businesses to respond effectively. For example, there are over 180,000 pages of regulation in the federal register and they are updated frequently. In the banking industry alone, the costs of staying compliant with (local to global) regulatory requirements are expected to exceed $100 billion annually by 2020 [1]. Staying on top of this depends on a combination of human and machine approaches. The goal of this study is to be able to extract and infer just enough to be able to focus human attention on the right content. For example, given news of a regulatory change, can we understand just enough to infer what businesses might be impacted and who needs to be notified. This work at a basic level is an effort to ease up this information consumption need for specific tasks in a domain. 1 ar X iv :2 10 4. 08 93 6v 1 [ cs .A I] 1 8 A pr 2 02 1