A model for fast web mining prototyping

Web mining is a computation intensive task, even after the mining tool itself has been developed. Most mining software are developed ad-hoc and usually are not scalable nor reused for other mining tasks. The objective of this paper is to present a model for fast Web mining prototyping, referred to as WIM -- Web Information Mining. The underlying conceptual model of WIM provides its users with a level of abstraction appropriate for prototyping and experimentation throughout the Web data mining task. Abstracting from the idiosyncrasies of raw Web data representations facilitates the inherently iterative mining process. We present the WIM conceptual model, its associated algebra, and the WIM tool software architecture, which implements the WIM model. We also illustrate how the model can be applied to real Web data mining tasks. The experimentation of WIM in real use cases has shown to significantly facilitate Web mining prototyping.

[1]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[2]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1997, International Journal on Digital Libraries.

[3]  Dan Suciu,et al.  A query language for a Web-site management system , 1997, SGMD.

[4]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases and Webs , 1998, Proceedings 14th International Conference on Data Engineering.

[5]  Sourav S. Bhowmick,et al.  Web warehousing: an algebra for web information , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[6]  Ian Witten,et al.  Data Mining , 2000 .

[7]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[8]  Sriram Raghavan,et al.  Complex Queries over Web Repositories , 2003, VLDB.

[9]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[10]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[11]  Terence T. Ow,et al.  WEBVIEW: an SQL extension for joining corporate data to data derived from the web , 2005, CACM.

[12]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[13]  Susie Stephens,et al.  Oracle Data Mining , 2005 .

[14]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[15]  Olfa Nasraoui,et al.  Web data mining: exploring hyperlinks, contents, and usage data , 2008, SKDD.

[16]  Ricardo Baeza-Yates,et al.  Genealogical trees on the web: a search engine user perspective , 2008, WWW.

[17]  Ricardo Baeza-Yates,et al.  A Model for Web Mining Applications – Conceptual Model, Architecture, Implementation and Use Cases , 2008 .

[18]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).