Information Extraction, Real-Time Processing and DW2.0 in Operational Business Intelligence

In today’s enterprise, business processes and business intelligence applications need to access and use structured and unstructured information to extend business transactions and analytics with as much adjacent data as possible. Unfortunately, all this information is scattered in many places, in many forms; managed by different database systems, document management systems, and file systems. Companies end up having to build one-of-a-kind solutions to integrate these disparate systems and make the right information available at the right time and in the right form for their business transactions and analytical applications. Our goal is to create an operational business intelligence platform that manages all the information required by business transactions and combines facts extracted from unstructured sources with data coming from structured sources along the DW2.0 pipeline to enable actionable insights. In this paper, we give an overview of the platform functionality and architecture focusing in particular in the information extraction and analytics layers and their application to situational awareness for epidemics medical response.

[1]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[2]  Sunita Sarawagi,et al.  Information Extraction , 2008 .

[3]  Alon Y. Halevy,et al.  Why Your Data Won’t Mix , 2005, ACM Queue.

[4]  Andrew McCallum,et al.  Accurate Information Extraction from Research Papers using Conditional Random Fields , 2004, NAACL.

[5]  William H. Inmon,et al.  Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence , 2007 .

[6]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[7]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[8]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[9]  M. Castellanos,et al.  FACTS: an approach to unearth legacy contracts , 2004, Proceedings. First IEEE International Workshop on Electronic Contracting, 2004..

[10]  Kevin Wilkinson,et al.  Data integration flows for business intelligence , 2009, EDBT '09.

[11]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[12]  Chetan Gupta,et al.  Leveraging web streams for contractual situational awareness in operational BI , 2010, EDBT '10.

[13]  W. H. Inmon,et al.  Dw 2.0: The Architecture for the Next Generation of Data Warehousing , 2008 .