A network-based model for high-dimensional information filtering

The Vector Space Model has been and to a great extent still is the de facto choice for profile representation in content-based Information Filtering. However, user profiles represented as weighted keyword vectors have inherent dimensionality problems. As the number of profile keywords increases, the vector representation becomes ambiguous, due to the exponential increase in the volume of the vector space and in the number of possible keyword combinations. We argue that the complexity and dynamics of Information Filtering require user profile representations which are resilient and resistant to this "curse of dimensionality". A user profile has to be able to incorporate many features and to adapt to a variety of interest changes. We propose an alternative, network-based profile representation that meets these challenging requirements. Experiments show that the network profile representation can more effectively capture additional information about a user's interests and thus achieve significant performance improvements over a vector-based representation comprising the same weighted keywords.

[1]  Geoffrey I. Webb,et al.  # 2001 Kluwer Academic Publishers. Printed in the Netherlands. Machine Learning for User Modeling , 1999 .

[2]  Stephen E. Robertson,et al.  The TREC 2002 Filtering Track Report , 2002, TREC.

[3]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[4]  Key-Sun Choi,et al.  Automatic thesaurus construction using Bayesian networks , 1995, CIKM '95.

[5]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[6]  Anne N. De Roeck,et al.  Autopoiesis, the immune system, and adaptive information filtering , 2009, Natural Computing.

[7]  Emma Hart,et al.  Representation in the (Artificial) Immune System , 2009, J. Math. Model. Algorithms.

[8]  Manolis Vavalis,et al.  Revisiting Evolutionary Information Filtering , 2010, IEEE Congress on Evolutionary Computation.

[9]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[10]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[11]  Manolis Vavalis,et al.  What Happened to Content-Based Information Filtering? , 2009, ICTIR.

[12]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[13]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[14]  Yiming Yang,et al.  kNN, Rocchio and Metrics for Information Filtering at TREC-10 , 2001, TREC.

[15]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[16]  Yi Zhang Using bayesian priors to combine classifiers for adaptive filtering , 2004, SIGIR '04.

[17]  Colm O'Riordan,et al.  Profiling with the INFOrmer Text Filtering Agent , 1997, J. Univers. Comput. Sci..

[18]  Manolis Vavalis,et al.  Immune Learning in a Dynamic Information Environment , 2009, ICARIS.

[19]  John Yen,et al.  An adaptive algorithm for learning changes in user interests , 1999, CIKM '99.

[20]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[21]  George W. Furnas,et al.  Pictures of relevance: A geometric analysis of similarity measures , 1987, J. Am. Soc. Inf. Sci..

[22]  Yiming Yang,et al.  Utility-based information distillation over temporally sequenced documents , 2007, SIGIR.

[23]  P. W. Foltz,et al.  Using latent semantic indexing for information filtering , 1990, COCS '90.