Mining Rough Association from Text Documents for Web Information Gathering

It is a big challenge to guarantee the quality of association rules in some application areas (e.g., in Web information gathering) since duplications and ambiguities of data values (e.g., terms). Rough set based decision tables could be efficient tools for solving this challenge. This paper first illustrates the relationship between decision tables and association mining. It proves that a decision rule is a kind of closed pattern. It also presents an alternative concept of rough association rules to improve the quality of discovered knowledge in this area. The premise of a rough association rule consists of a set of terms (items) and a weight distribution of terms (items). The distinct advantage of rough association rules is that they contain more specific information than normal association rules. This paper also conducts some experiments to compare the proposed method with association rule mining and decision tables; and the experimental results verify that the proposed approach is promising.

[1]  Geoffrey I. Webb,et al.  K-Optimal Rule Discovery , 2005, Data Mining and Knowledge Discovery.

[2]  James A. M. McHugh,et al.  Mining the World Wide Web: An Information Search Approach , 2001 .

[3]  Javed Mostafa,et al.  A multilevel approach to intelligent information filtering: model, system, and evaluation , 1997, TOIS.

[4]  Zdzislaw Pawlak,et al.  In Pursuit of Patterns in Data Reasoning from Data - The Rough Set Way , 2002, Rough Sets and Current Trends in Computing.

[5]  Stephen E. Robertson,et al.  The TREC-8 Filtering Track Final Report , 1999, TREC.

[6]  Osmar R. Zaïane,et al.  Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  MAGDALINI EIRINAKI,et al.  Web mining for web personalization , 2003, TOIT.

[8]  Yuefeng Li,et al.  Capturing Evolving Patterns for Ontology-based Web Mining , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[9]  Sadaaki Miyamoto,et al.  Rough Sets and Current Trends in Computing , 2012, Lecture Notes in Computer Science.

[10]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[11]  Ido Dagan,et al.  Mining Text Using Keyword Distributions , 1998, Journal of Intelligent Information Systems.

[12]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[13]  Zdzislaw Pawlak,et al.  Flow Graphs and Decision Algorithms , 2003, RSFDGrC.

[14]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics , 1998 .

[15]  David A. Bell,et al.  The rough set approach to association rule mining , 2003, Third IEEE International Conference on Data Mining.

[16]  Yue Xu,et al.  Automatic Pattern-Taxonomy Extraction for Web Mining , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[17]  Yuefeng Li,et al.  Web mining model and its applications for information gathering , 2004, Knowl. Based Syst..

[18]  Soon Myoung Chung,et al.  Multipass Algorithms for Mining Association Rules in Text Databases , 2001, Knowledge and Information Systems.

[19]  Yuefeng Li,et al.  Mining ontology for automatically acquiring Web user information needs , 2006, IEEE Transactions on Knowledge and Data Engineering.

[20]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[21]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[22]  Yonatan Aumann,et al.  Maximal Association Rules: A New Tool for Mining for Keyword Co-Occurrences in Document Collections , 1997, KDD.

[23]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval) , 2004 .

[24]  Jeffrey Bennett,et al.  CLARIT Experiments in Batch Filtering: Term Selection and Threshold Optimization in IR and SVM Filters , 2002, TREC.

[25]  Jiawei Han,et al.  TSP: Mining top-k closed sequential patterns , 2004, Knowledge and Information Systems.

[26]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.

[27]  Haym Hirsh,et al.  Mining Associations in Text in the Presence of Background Knowledge , 1996, KDD.

[28]  Philip S. Yu,et al.  Discovering Business Intelligence Information by Comparing Company Web Sites , 2003 .

[29]  Yehuda Lindell,et al.  Text Mining at the Term Level , 1998, PKDD.

[30]  James A. M. McHugh,et al.  Mining the World Wide Web , 2001, The Information Retrieval Series.

[31]  Yuefeng Li,et al.  Interpretations of association rules by granular computing , 2003, Third IEEE International Conference on Data Mining.

[32]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.