A Web Text Mining Flexible Architecture

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge mining process to mining, extraction and integration of useful data, information and knowledge from Web page contents. In this paper, we present a Web Text Mining process able to discover knowledge in a distributed and heterogeneous multiorganization environment. The Web Text Mining process is based on flexible architecture and is implemented by four steps able to examine web content and to extract useful hidden information through mining techniques. Our Web Text Mining prototype starts from the recovery of Web job offers in which, through a Text Mining process, useful information for fast classification of the same are drawn out, these information are, essentially, job offer place and skills. Keywords—Web text mining, flexible architecture, knowledge discovery.

[1]  Oren Etzioni,et al.  The World-Wide Web: quagmire or gold mine? , 1996, CACM.

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[4]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[5]  Marcello Castellano,et al.  A Flexible Mining Architecture for Providing New E-Knowledge Services , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[6]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[7]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[8]  Venansius Baryamureeba,et al.  PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 8 , 2005 .

[9]  O. Etzioni,et al.  The world-wide web : Quagmire or gold mine ? : Data mining and knowledge discovery in databases , 1996 .

[10]  Ah-Hwee Tan,et al.  Text Mining: The state of the art and the challenges , 2000 .

[11]  Un Yong Nahm and Raymond J. Mooney,et al.  Using Information Extraction to Aid the Discovery of Prediction Rules from Text , 2000 .

[12]  Marcello Castellano,et al.  A Knowledge Center for a Social and Economic Growth of the Territory , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[13]  Giuseppe Mastronardi,et al.  Orchestrating the Knowledge Discovery Process , 2007, E-Service Intelligence.

[14]  M. Castellano,et al.  A Web Mining process for e-Knowledge services , 2006 .

[15]  Sourav S. Bhowmick,et al.  Web schemas in WHOWEDA , 2000, DOLAP '00.

[16]  Marcello Castellano,et al.  An E-Government Cooperative Framework for Government Agencies , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[17]  Regina Motz,et al.  Managing Source Schema Evolution in Web Warehouses , 2002, Workshop on Information Integration on the Web.

[18]  Ralph Kimball,et al.  The Data Webhouse Toolkit: Building the Web-enabled Data Warehouse , 2000, Ind. Manag. Data Syst..

[19]  Giuseppe Mastronardi,et al.  Applying a Flexible Mining Architecture to Intrusion Detection , 2007, The Second International Conference on Availability, Reliability and Security (ARES'07).

[20]  Jaideep Srivastava,et al.  Creating adaptive Web sites through usage-based clustering of URLs , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).