A Survey on Web Content Mining and Extraction of Structured and Semistructured Data

With the research in information retrieval and phenomenal growth of the Web, todaypsilas Websites have become a key communication and information medium for various organizations. It also offers an unprecedented opportunity and challenges to data mining. Various techniques are available to extract useful data from the web. It is very important for the users to utilize this information effectively which helps them to understand the structure of information on the Web more deeply and precisely. This paper conducts a survey of how Web content mining plays an efficient tool in extracting structured and semi structured data and mining them into useful knowledge.

[1]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[2]  Enhong Chen,et al.  Semi-Structured Data Extraction and Schema Knowledge Mining , 1999, EUROMICRO.

[3]  Hans-Peter Kriegel,et al.  Accurate and Efficient Crawling for Relevant Websites , 2004, VLDB.

[4]  Zhao Li,et al.  WICCAP: from semi-structured data to structured data , 2004, Proceedings. 11th IEEE International Conference and Workshop on the Engineering of Computer-Based Systems, 2004..

[5]  Wang Xufa,et al.  Semi-structured data extraction and schema knowledge mining , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.