A two-stage decision model for information filtering

Information mismatch and overload are two fundamental issues influencing the effectiveness of information filtering systems. Even though both term-based and pattern-based approaches have been proposed to address the issues, neither of these approaches alone can provide a satisfactory decision for determining the relevant information. This paper presents a novel two-stage decision model for solving the issues. The first stage is a novel rough analysis model to address the overload problem. The second stage is a pattern taxonomy mining model to address the mismatch problem. The experimental results on RCV1 and TREC filtering topics show that the proposed model significantly outperforms the state-of-the-art filtering systems.

[1]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[2]  Raymond Y. K. Lau,et al.  Using Information Filtering in Web Data Mining Process , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[3]  Tao Qin,et al.  Ranking with multiple hyperplanes , 2007, SIGIR.

[4]  Kam-Fai Wong,et al.  An intelligent information agent for document title classification and filtering in document-intensive domains , 2007, Decis. Support Syst..

[5]  Yuefeng Li,et al.  Mining ontology for automatically acquiring Web user information needs , 2006, IEEE Transactions on Knowledge and Data Engineering.

[6]  ChengXiang Zhai,et al.  A study of methods for negative relevance feedback , 2008, SIGIR '08.

[7]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[8]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[9]  Ophir Frieder,et al.  Repeatable evaluation of search services in dynamic environments , 2007, TOIS.

[10]  Chengqi Zhang,et al.  An information filtering model on the Web and its application in JobAgent , 2000, Knowl. Based Syst..

[11]  Makoto Iwayama,et al.  Relevance feedback with a small number of relevance judgements: incremental relevance feedback vs. document clustering , 2000, SIGIR '00.

[12]  Stan Matwin,et al.  Feature Engineering for Text Classification , 1999, ICML.

[13]  Yuefeng Li,et al.  Effective Pattern Discovery for Text Mining , 2012, IEEE Transactions on Knowledge and Data Engineering.

[14]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[15]  Javed Mostafa,et al.  A multilevel approach to intelligent information filtering: model, system, and evaluation , 1997, TOIS.

[16]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[17]  Jiawei Han,et al.  TSP: Mining top-k closed sequential patterns , 2004, Knowledge and Information Systems.

[18]  Yuefeng Li,et al.  Mining positive and negative patterns for relevance feature discovery , 2010, KDD.

[19]  Djoerd Hiemstra,et al.  The Impact of Positive, Negative and Topical Relevance Feedback , 2008, TREC.

[20]  Yiyu Yao,et al.  A Decision Theoretic Framework for Approximating Concepts , 1992, Int. J. Man Mach. Stud..

[21]  Yue Xu,et al.  Deploying Association Rules on Hypothesis Spaces , 2004 .

[22]  Bing Liu,et al.  Identifying comparative sentences in text documents , 2006, SIGIR.

[23]  Yuefeng Li,et al.  Cooperative strategy for web data mining and cleaning , 2003, Appl. Artif. Intell..

[24]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[25]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[26]  Raymond Y. K. Lau,et al.  Towards a belief-revision-based adaptive and context-sensitive information retrieval system , 2008, TOIS.

[27]  Javed Mostafa,et al.  Automatic classification using supervised learning in a medical document filtering application , 2000, Inf. Process. Manag..

[28]  Tomek Strzalkowski,et al.  Robust Text Processing in Automated Information Retrieval , 1994, ANLP.

[29]  Stephen E. Robertson,et al.  Building a filtering test collection for TREC 2002 , 2003, SIGIR.

[30]  Ali F. Farhoomand,et al.  Managerial information overload , 2002, CACM.

[31]  Stephen E. Robertson,et al.  The TREC 2002 Filtering Track Report , 2002, TREC.

[32]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[33]  Ramayya Krishnan,et al.  A method for managing access to web pages: Filtering by Statistical Classification (FSC) applied to text , 2006, Decis. Support Syst..

[34]  Yue Xu,et al.  Deploying Approaches for Pattern Refinement in Text Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[35]  Yonatan Aumann,et al.  Maximal Association Rules: A New Tool for Mining for Keyword Co-Occurrences in Document Collections , 1997, KDD.

[36]  Xuelong Li,et al.  Negative Samples Analysis in Relevance Feedback , 2007, IEEE Transactions on Knowledge and Data Engineering.

[37]  Hongyuan Zha,et al.  A regression framework for learning ranking functions using relative relevance judgments , 2007, SIGIR.

[38]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[39]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[40]  Yuefeng Li,et al.  Web mining model and its applications for information gathering , 2004, Knowl. Based Syst..

[41]  Raymond Y. K. Lau,et al.  Utilizing Search Intent in Topic Ontology-Based User Profile for Web Mining , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[42]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[43]  Yiming Yang,et al.  Utility-based information distillation over temporally sequenced documents , 2007, SIGIR.

[44]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[45]  Yue Xu,et al.  Automatic Pattern-Taxonomy Extraction for Web Mining , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[46]  Raymond Y. K. Lau,et al.  A two-stage text mining model for information filtering , 2008, CIKM '08.

[47]  Mika Klemettinen,et al.  Applying data mining techniques for descriptive phrase extraction in digital document collections , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[48]  Yonatan Aumann,et al.  Maximal Association Rules: A Tool for Mining Associations in Text , 2005, Journal of Intelligent Information Systems.

[49]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[50]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[51]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.

[52]  Thorsten Joachims,et al.  A Statistical Learning Model of Text Classification for Support Vector Machines. , 2001, SIGIR 2002.