Integrating Web-Based Data into A Data Warehouse

Abstract Data warehousing technologies have become mature enough to efficiently store and process huge data sets, which has shifted the data warehousing challenge from increasing data processing capacity to enriching data resources in order to provide better decision-making assistance. There have been reports that some organizations intend to recruit Web data into data warehouse systems as a means of responding to the challenge of enriching data resources, because infinite information has made the Internet the largest external database to each organization. However, there is not a systematical guideline to support such an intention. To fill this void, we introduce Web integration as a strategy to merge data warehouses and the Web, with an emphasis on effectively and efficiently acquiring Web data into data warehouses. We also point out that the critical step for Web integration is to acquire genuinely valuable business data from the Web. A framework for determining the business value of Web data is offered to facilitate Web integration efforts.

[1]  Ian I. Mitroff,et al.  A Program for Research on Management Information Systems , 1973 .

[2]  Trevor Denner Decision support for management , 1984 .

[3]  Gordon B. Davis,et al.  Management information systems : conceptual foundations, structure, and development , 1985 .

[4]  Gerardine DeSanctis,et al.  A foundation for the study of group decision support systems , 1987 .

[5]  James C. Wetherbe Executive Information Requirements: Getting It Right , 1991, MIS Q..

[6]  Jane E. Klobas,et al.  Beyond information quality: fitness for purpose and electronic information resource use , 1995, J. Inf. Sci..

[7]  W. H. Inmon,et al.  The data warehouse and data mining , 1996, CACM.

[8]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[9]  J. O'Brien Management Information Systems: Managing Information Technology in the Networked Enterprise , 1996 .

[10]  Steve Wilent Pulling packages from the Web , 1997 .

[11]  Dana Marie Gardner,et al.  Cashing in with data warehouses and the Web , 1997 .

[12]  Ali H. Murtaza,et al.  A Framework for Developing Enterprise Data Warehouses , 1998, Inf. Syst. Manag..

[13]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[14]  Giri Kumar Tayi,et al.  Examining data quality , 1998, CACM.

[15]  W. H. Inmon,et al.  Data Warehouse Performance , 1998 .

[16]  Nicholas J. Belkin,et al.  Understanding Judgment of Information Quality and Cognitive Authority in the WWW , 1998 .

[17]  Ken Orr,et al.  Data quality and systems theory , 1998, CACM.

[18]  Lucy Marshall,et al.  Finding needles in the haystack : Mining meets the Web , 1999 .

[19]  Wallace Koehler,et al.  An Analysis of Web Page and Web Site Constancy and Permanence , 1999, J. Am. Soc. Inf. Sci..

[20]  Ralph Kimball,et al.  The Data Webhouse Toolkit: Building the Web-enabled Data Warehouse , 2000, Ind. Manag. Data Syst..

[21]  Xiao Chun Web-based Data Warehousing , 2001 .