Toward total business intelligence incorporating structured and unstructured data

As the amount of data grows very fast inside and outside of an enterprise, it is getting important to seamlessly analyze both of them for getting total business intelligence. The data can be classified into two categories: structured and unstructured. Especially, as most of valuable business information are encoded in the unstructured text documents including Web pages in Internet, we need a specialized Text OLAP solution to perform multi-dimensional analysis on text documents in the same way as on structured relational data. Since the technologies of text mining and information retrieval are major technologies handling text data, we first review the representative works selected for demonstrating how they can be applied for Text OLAP. And then, we survey the representative works selected for demonstrating how we can associate and consolidate both unstructured text documents and structured relation data for obtaining total business intelligence. Finally, we present an architecture for a total business intelligence platform incorporating structured and unstructured data. We expect the proposed architecture, which integrates information retrieval, text mining, and information extraction technologies all together as well as relational OLAP technologies, would make an effective platform toward total business intelligence.

[1]  Ralph Grishman,et al.  Information Extraction: Techniques and Challenges , 1997, SCIE.

[2]  Torben Bach Pedersen,et al.  IR and OLAP in XML Document Warehouses , 2005, ECIR.

[3]  Torben Bach Pedersen,et al.  Contextualizing data warehouses with documents , 2008, Decis. Support Syst..

[4]  Ee-Peng Lim,et al.  TUBE (Text-cUBE) for discovering documentary evidence of associations among entities , 2007, SAC '07.

[5]  Olivier Teste,et al.  Top_Keyword: An Aggregation Function for Textual Document OLAP , 2008, DaWaK.

[6]  Machdel C. Matthee,et al.  Differentiating data- and text-mining terminology , 2003 .

[7]  Torben Bach Pedersen,et al.  R-Cubes: OLAP Cubes Contextualized with Documents , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[9]  Dan Sullivan,et al.  Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales , 2001 .

[10]  Bo Zhao,et al.  iNextCube: Information Network-Enhanced Text Cube , 2009, Proc. VLDB Endow..

[11]  Olivier Teste,et al.  Olap aggregation function for textual data warehouse , 2016, ICEIS.

[12]  Hyoil Han,et al.  XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses , 2005, DaWaK.

[13]  Bo Zhao,et al.  Text Cube: Computing IR Measures for Multidimensional Text Database Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[14]  Frank S. C. Tseng,et al.  The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence , 2006, Decis. Support Syst..

[15]  Mukesh K. Mohania,et al.  Enhanced Business Intelligence using EROCS , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Chetan Gupta,et al.  SIE-OBI: a streaming information extraction platform for operational business intelligence , 2010, SIGMOD Conference.

[17]  Rafael Berlanga Llavori,et al.  A Document Model Based on Relevance Modeling Techniques for Semi-structured Information , 2004, DEXA.

[18]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[19]  Frank S. C. Tseng,et al.  D-Tree: A Multi-Dimensional Indexing Structure for Constructing Document Warehouses , 2006, J. Inf. Sci. Eng..

[20]  Mukesh K. Mohania,et al.  Efficiently linking text documents with relevant structured information , 2006, VLDB.

[21]  Torben Bach Pedersen,et al.  A relevance-extended multi-dimensional model for a data warehouse contextualized with documents , 2005, DOLAP '05.

[22]  Marie-Francine Moens,et al.  Information Extraction: Algorithms and Prospects in a Retrieval Context , 2006, The Information Retrieval Series.

[23]  Torben Bach Pedersen,et al.  Towards a Data Warehouse Contextualized with Web Opinions , 2008, 2008 IEEE International Conference on e-Business Engineering.

[24]  Frank Shou-Cheng Tseng Design of a multi-dimensional query expression for document warehouses , 2005, Inf. Sci..

[25]  Jiawei Han,et al.  Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases , 2009, SDM.

[26]  Ophir Frieder,et al.  Integrating structured data and text: a multi-dimensional approach , 2000, Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540).

[27]  Torben Bach Pedersen,et al.  Integrating Data Warehouses with Web Data: A Survey , 2008, IEEE Transactions on Knowledge and Data Engineering.

[28]  Berthold Reinwald,et al.  Multidimensional content eXploration , 2008, Proc. VLDB Endow..

[29]  Jinho Lee,et al.  MIRE: a multidimensional information retrieval engine for structured data and text , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[30]  Jinho Lee,et al.  On the design and evaluation of a multi-dimensional approach to information retrieval (poster session) , 2000, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[31]  Bernard Dousset,et al.  DocCube: Multi-dimensional visualisation and exploration of large document sets , 2003, J. Assoc. Inf. Sci. Technol..

[32]  Rahul Gupta,et al.  LIPTUS: associating structured and unstructured information in a banking environment , 2007, SIGMOD '07.

[33]  Mukesh K. Mohania,et al.  Towards automatic association of relevant unstructured content with structured query results , 2005, CIKM '05.

[34]  Chetan Gupta,et al.  Leveraging web streams for contractual situational awareness in operational BI , 2010, EDBT '10.