Passage-Based Web Text Mining

A large amount of textual information on the Web is very useful information resource. In the past, traditional text mining research treated a text document as a single piece of information. However, some Web documents are long and heterogeneous in their contents. This paper presents a new approach to apply the concept of a passage to Web text mining. A single Web text document is considered as several passages, instead of a single text. The effectiveness is investigated using real Thai Web documents. As the preliminary step, we explore influence of the passage-based method on construction of association rules by comparing rules generated by the passage-based method with those generated by the nonpassage-based method.

[1]  Huang Yuan,et al.  Web mining: knowledge discovery on the Web , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[2]  Yonatan Aumann,et al.  Text Mining via Information Extraction , 1999, PKDD.