Extracting and Cleaning Data from Semi-structure
暂无分享,去创建一个
Data mining helps to uncover valuable information from raw data in large volume. However, the latter usually comes in text instead of structured form, and contains noise which makes analysis difficult. Therefore, it is of vital importance to extract and clean raw data before in-depth analysis are applied. This paper presents a new approach to data extraction and cleaning from semi-structured Chinese texts. Experimental results show that it can effectively prepare data for mining.
[1] W. H. Inmon,et al. Building the data warehouse , 1992 .
[2] Craig A. Knoblock,et al. Wrapper generation for semi-structured Internet sources , 1997, SGMD.
[3] Nicholas Kushmerick,et al. Wrapper induction: Efficiency and expressiveness , 2000, Artif. Intell..