Bayesian graphical models for adaptive filtering

A personal information filtering system monitors an incoming document stream to find the documents that match information needs specified by user profiles. The most challenging aspect in adaptive filtering is to develop a system to learn user profiles efficiently and effectively from very limited user supervision. In order to overcome this challenge, the system needs to do the following: use a robust learning algorithm that can work reasonably well when the amount of training data is small and be more effective with more training data; explore what a user likes while satisfying the user's immediate information need and trade off exploration and exploitation; consider many aspects of a document besides relevance, such as novelty, readability and authority; use multiple forms of evidence, such as user context and implicit feedback from the user, while interacting with a user; and handle various scenarios, such as missing data, in an operational environment robustly. This dissertation uses the Bayesian graphical modelling approach as a unified framework for filtering. We customize the framework to the filtering domain and develop a set of solutions that enable us to build a filtering system with the desired characteristics in a principled way. We evaluate and justify these solutions on a large and diverse set of standard and new adaptive filtering test collections. Firstly, we develop a novel technique to incorporate an IR expert's heuristic algorithm as a Bayesian prior into a machine learning classifier to improve the robustness of a filtering system. Secondly, we derive a novel model quality measure based on the uncertainty of model parameters to trade off exploration and exploitation and do active learning. Thirdly, we carry out a user study with a real web-based personal news filtering system and more than 20 users. With the data collected in the user study, we explore how to use existing graphical modeling algorithms to learn the causal relationships between multiple forms of evidence and improve the filtering system's performance using this evidence.

[1]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[2]  Edward Fox,et al.  Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types , 1983 .

[3]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[4]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[5]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[6]  Stephen E. Robertson,et al.  The TREC 2002 Filtering Track Report , 2002, TREC.

[7]  A. Mandelbaum CONTINUOUS MULTI-ARMED BANDITS AND MULTIPARAMETER PROCESSES , 1987 .

[8]  Andrew L. Rukhin,et al.  Tools for statistical inference , 1991 .

[9]  Rohini K. Srihari,et al.  UB at TREC 11: Batch and Adaptive Filtering , 2002, TREC.

[10]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[11]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[12]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[13]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[14]  Michael P. Wellman,et al.  Accounting for Context in Plan Recognition, with Application to Traffic Monitoring , 1995, UAI.

[15]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[16]  Marek J. Druzdzel,et al.  SMILE: Structural Modeling, Inference, and Learning Engine and GeNIE: A Development Environment for Graphical Decision-Theoretic Models , 1999, AAAI/IAAI.

[17]  Daphne Koller,et al.  Active Learning for Parameter Estimation in Bayesian Networks , 2000, NIPS.

[18]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[19]  Berthier A. Ribeiro-Neto,et al.  Link-based and content-based evidential information in a belief network model , 2000, SIGIR '00.

[20]  Yi Zhang,et al.  Maximum likelihood estimation for filtering thresholds , 2001, SIGIR '01.

[21]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[22]  Christine D. Piatko,et al.  JHU/APL at TREC 2002: Experiments in Filtering and Arabic Retrieval , 2002, TREC.

[23]  Linda Schamber,et al.  User Criteria in Relevance Evaluation: Toward Development of a Measurement Scale. , 1996 .

[24]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[25]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[26]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[27]  Min Zhang,et al.  Incremental Learning for Profile Training in Adaptive Document Filtering , 2002, TREC.

[28]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[29]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[30]  Hwee Tou Ng,et al.  Bayesian online classifiers for text classification and filtering , 2002, SIGIR '02.

[31]  Ricardo Carreira,et al.  Evaluating adaptive user profiles for news classification , 2004, IUI '04.

[32]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[33]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[34]  Donna K. Harman,et al.  Overview of the TREC 2002 Novelty Track , 2002, TREC.

[35]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[36]  Wai Lam,et al.  A meta-learning approach for text categorization , 2001, SIGIR '01.

[37]  Phong Le,et al.  A Curious browser -- implicit ratings. , 2000 .

[38]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[39]  Mohand Boughanem,et al.  IRIT at TREC 2002: Filtering Track , 2002, TREC.

[40]  Monika Henzinger,et al.  Query-free news search , 2003, WWW.

[41]  James P. Callan,et al.  Document filtering with inference networks , 1996, SIGIR '96.

[42]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[43]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[44]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[45]  John Domingue,et al.  KMi Planet: a web based news server , 1998, Proceedings. 3rd Asia Pacific Computer Human Interaction (Cat. No.98EX110).

[46]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[47]  Mark Claypool,et al.  Implicit interest indicators , 2001, IUI '01.

[48]  Rajat Raina,et al.  Classification with Hybrid Generative/Discriminative Models , 2003, NIPS.

[49]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[50]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[51]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[52]  ChengXiang Zhai,et al.  Active Feedback - UIUC TREC-2003 HARD Experiments , 2003, TREC.

[53]  C. Lee Giles,et al.  Self-adaptive user profiles for large-scale data delivery , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[54]  Christos Faloutsos,et al.  A survey of information retrieval and filtering methods , 1995 .

[55]  Yi-Cheng Ku,et al.  Customized Internet news services based on customer profiles , 2003, ICEC '03.

[56]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[57]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[58]  Eric Horvitz,et al.  The Lumière Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users , 1998, UAI.

[59]  Robert M. Losee,et al.  Integrating Boolean queries in conjunctive normal form with probabilistic retrieval models , 1988, Inf. Process. Manag..

[60]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[61]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[62]  Daphne Koller,et al.  Active learning: theory and applications , 2001 .

[63]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[64]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[65]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 2002: Filtering Track , 2002, TREC.

[66]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[67]  Liliana Ardissono,et al.  An adaptive system for the personalized access to news , 2001, AI Commun..

[68]  Sebastian Thrun,et al.  Bayesian Network Induction via Local Neighborhoods , 1999, NIPS.

[69]  Stephen E. Robertson,et al.  Salton Award Lecture on theoretical argument in information retrieval , 2000, SIGF.

[70]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[71]  Susan T. Dumais,et al.  Probabilistic combination of text classifiers using reliability indicators: models and results , 2002, SIGIR '02.

[72]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[73]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[74]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[75]  Kevyn Collins-Thompson,et al.  Information Filtering, Novelty Detection, and Named-Page Finding , 2002, TREC.

[76]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[77]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[78]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[79]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[80]  Eric Horvitz,et al.  Web montage: a dynamic personalized start page , 2002, WWW '02.

[81]  Yiming Yang,et al.  kNN, Rocchio and Metrics for Information Filtering at TREC-10 , 2001, TREC.

[82]  Wei-Ying Ma,et al.  Probabilistic model for contextual retrieval , 2004, SIGIR '04.

[83]  Avi Arampatzis,et al.  The score-distributional threshold optimization for adaptive binary classification tasks , 2001, SIGIR '01.

[84]  Constantin F. Aliferis,et al.  Causal Explorer: A Causal Probabilistic Network Learning Toolkit for Biomedical Discovery , 2003, METMBS.

[85]  Yiming Yang,et al.  Robustness of adaptive filtering methods in a cross-benchmark evaluation , 2005, SIGIR '05.

[86]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[87]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[88]  Yi Zhang,et al.  The Bias Problem and Language Models in Adaptive Filtering , 2001, TREC.

[89]  Charles L. Wayne Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation , 2000, LREC.

[90]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[91]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[92]  Yiming Yang,et al.  Margin-based local regression for adaptive filtering , 2003, CIKM '03.

[93]  Constantin F. Aliferis,et al.  HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection , 2003, AMIA.

[94]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[95]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[96]  Gérard Dreyfus,et al.  Training Context-Sensitive Neural Networks with Few Relevant Examples for the TREC-9 Routing , 2000, TREC.

[97]  David Lindley,et al.  Bayesian Statistics, a Review , 1987 .

[98]  Vaughn R. McKim,et al.  Causality in crisis? : statistical methods and the search for causal knowledge in the social sciences , 1998 .

[99]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[100]  David D. Lewis,et al.  Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing Tasks , 2001, TREC.

[101]  Ryen W. White,et al.  An implicit feedback approach for interactive information retrieval , 2006, Inf. Process. Manag..

[102]  Robert M. Fung,et al.  Applying Bayesian networks to information retrieval , 1995, CACM.

[103]  Bernard Mérialdo,et al.  Automatic construction of personalized TV news programs , 1999, MULTIMEDIA '99.

[104]  Cristina Conati,et al.  On-Line Student Modeling for Coached Problem Solving Using Bayesian Networks , 1997 .

[105]  Yi Zhang Using bayesian priors to combine classifiers for adaptive filtering , 2004, SIGIR '04.

[106]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[107]  Stephen E. Robertson,et al.  Threshold setting in adaptive filtering , 2000, J. Documentation.

[108]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[109]  Paul B. Kantor,et al.  Rutgers Filtering Work at TREC 2002: Adaptive and Batch , 2002, TREC.

[110]  Guy Shani,et al.  Recommendation as a Stochastic Sequential Decision Problem , 2003, ICAPS.

[111]  Junyu Niu,et al.  FDU at TREC 2002: Filtering, Q&A, Web and Video Tasks , 2002, TREC.

[112]  Ophir Frieder,et al.  Evaluation of filtering current news search results , 2004, SIGIR '04.

[113]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[114]  Hinrich Schütze,et al.  Method combination for document filtering , 1996, SIGIR '96.

[115]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[116]  Dennis V. Lindley,et al.  Empirical Bayes Methods , 1974 .

[117]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[118]  Kyo Kageura,et al.  TREC 11 Experiments at NII: The Effects of Virtual Relevant Documents in Batch Filtering , 2002, TREC.

[119]  Michael J. Pazzani,et al.  A personal news agent that talks, learns and explains , 1999, AGENTS '99.

[120]  Yoichi Shinoda,et al.  Information filtering based on user behavior analysis and best match text retrieval , 1994, SIGIR '94.

[121]  Peter Jansen,et al.  Threshold Calibration in CLARIT Adaptive Filtering , 1998, TREC.