Automatic Web Page Classification by Combining Feature Selection Techniques and Lazy Learners

The analysis of large datasets has become an important tool in understanding complex systems in areas such as economics, business, science and engineering. Such datasets are often collected geographically distributed way and cannot in practice be gathered in to a single repository. Applications that work with such datasets cannot control most aspects of the data's partitioning and arrangements. So far, attention in data mining process has always focused on extracting information from data physically located at one central site and they often do not consider the resource constraints of distributed and mobile environments. Few attempts were also made in parallel data mining. However most real life applications rely on data distributed in several locations. As a consequence both new architectures and new algorithms are needed. In this paper author proposes a method that explores the capabilities of mobile agents to build an appropriate frame work and an algorithm that better suits the distributed data mining applications. it also makes the performance analysis and comparison with the existing such method.

[1]  Yun-Lan Wang,et al.  Mobile-agent-based distributed and incremental techniques for association rules , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[2]  Wu-Shan Jiang,et al.  Distributed data mining on the grid , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[3]  John M. Pierre,et al.  On the Automated Classification of Web Sites , 2001, ArXiv.

[4]  Abraham Kandel,et al.  CLUSTERING AND CLASSIFICATION OF WEB DOCUMENTS USING A GRAPH MODEL , 2005 .

[5]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yi Sun,et al.  Study on algorithms of parallel and distributed data mining calculating process , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[7]  Jiming Liu,et al.  Service-Oriented Distributed Data Mining , 2006, IEEE Internet Computing.

[8]  John See,et al.  Fuzzy edge detector using entropy optimization , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[9]  Abraham Kandel,et al.  Classification of Web documents using a graph model , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[11]  Ee-Peng Lim,et al.  Web classification using support vector machine , 2002, WIDM '02.

[12]  Srinivasan Parthasarathy,et al.  Parallel and distributed methods for incremental frequent itemset mining , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Yike Guo,et al.  An Architecture for Distributed Enterprise Data Mining , 1999, HPCN Europe.

[14]  Jong-Hyeok Lee,et al.  Web page classification based on k-nearest neighbor approach , 2000, IRAL '00.

[15]  Eric Brown,et al.  Using machine learning techniques and data mining tools for web document classification , 2005 .

[16]  Umakant P. Kulkarni,et al.  Exploring the Capabilities of Mobile Agents in Distributed Data Mining , 2006, 2006 10th International Database Engineering and Applications Symposium (IDEAS'06).