A Noise Reduction Approach based on n x 1 Table and XSL Display Method for Efficient Web Data Extraction

A web page which is a source of information consist lots of parts among which only a part of the information is useful for a particular application and the remaining information are noises. An effective technique for users to extract the useful information from the total information is urgently required. Hence by removing those noise patterns from the web page, the efficiency of the web data extraction can be improved. This research work propose an approach for removing the local noise from a given web page based on n x 1 table and XSL display method with filter feature for improving the efficiency of web data extraction.

[1]  Bing Liu,et al.  Web Page Cleaning for Web Mining through Feature Weighting , 2003, IJCAI.

[2]  Wei-Ying Ma,et al.  Extracting Content Structure for Web Pages Based on Visual Representation , 2003, APWeb.

[3]  Bharati Vidyapeeth,et al.  Extracting Content Blocks from Web Pages , 2009 .

[4]  Jan-Ming Ho,et al.  Discovering informative content blocks from Web documents , 2002, KDD.

[5]  Yan Guo,et al.  ECON: An Approach to Extract Content from Web News Page , 2010, 2010 12th International Asia-Pacific Web Conference.

[6]  Sandip Debnath,et al.  Automatic extraction of informative blocks from webpages , 2005, SAC '05.

[7]  Fuxi Zhu,et al.  The Noise Reduction Method of Web Pages Based on Image Features , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[8]  Reda Alhajj,et al.  Mining web content outliers using structure oriented weighting techniques and N-grams , 2005, SAC '05.

[9]  Tat-Seng Chua,et al.  Detecting and Partitioning Data Objects in Complex Web Pages , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[10]  Tat-Seng Chua,et al.  Learning object models from semistructured Web documents , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  G. V. Uma,et al.  Signed Approach for Mining Web Content Outliers , 2009 .

[12]  Byeong Ho Kang,et al.  Noise Elimination from the Web Documents by Using URL Paths and Information Redundancy , 2006, IKE.

[13]  Wei-Ying Ma,et al.  Learning important models for web page blocks based on layout and content analysis , 2004, SKDD.