Effective profiling of consumer information retrieval needs: a unified framework and empirical comparison

Due to the overwhelming volume of information that is increasingly available, many people rely on current awareness systems to keep abreast of the latest developments in the fields that they are interested in, as evidenced in the popularity of subscriptions to news-monitoring and digital library services. The success of these services, however, often requires effective acquisition of users' personal standing interests as represented in personal profiles. Our objective in this paper is twofold. First, we have introduced a new method for profile generation and compared it against other well-known methods. We have found promising results. Second, although there are various methods proposed in information retrieval and machine learning literature to address the issue of profiling, a unified framework and systematic cross-system comparison to help users, especially service providers, to determine the most effective way of profiling consumers is still lacking in the literature. In this paper, we try to fill the gap by looking at these methods from a more integrated point of view based on statistical contingency theory. Variations of these methods are then systematically tested on three well-known routing systems and results are analyzed and reported.

[1]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[2]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[3]  Hwee Tou Ng,et al.  DSO at TREC-8: A Hybrid Algorithm for the Routing Task , 1999, TREC.

[4]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  Weiguo Fan,et al.  Personalization of search engine services for effective retrieval and knowledge management , 2000, ICIS.

[7]  Marcel Worring,et al.  NIST Special Publication , 2005 .

[8]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[9]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[10]  Elazar J. Pedhazur,et al.  Measurement, Design, and Analysis: An Integrated Approach , 1994 .

[11]  Filippo Menczer,et al.  Adaptive Retrieval Agents: Internalizing Local Context and Scaling up to the Web , 2000, Machine Learning.

[12]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[13]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[14]  S. E. Robertson,et al.  On Relevance weight estimation and Query Expansion , 1986, J. Documentation.

[15]  Keith W. Miller,et al.  How good is good enough?: an ethical analysis of software construction and use , 1994, CACM.

[16]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[17]  Dagobert Soergel,et al.  The importance of SDI for current awareness in fields with severe scatter of information , 1979, J. Am. Soc. Inf. Sci..

[18]  Stanley B. Zdonik,et al.  “Data in your face”: push technology in perspective , 1998, SIGMOD '98.

[19]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[20]  Katia P. Sycara,et al.  WebMate: a personal agent for browsing and searching , 1998, AGENTS '98.

[21]  David A. Hull The TREC-7 Filtering Track: Description and Analysis , 1998, Text Retrieval Conference.

[22]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[23]  Measurement , 2007 .

[24]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[25]  Peter Willett,et al.  Document Retrieval Systems , 1988 .

[26]  Susan T. Dumais,et al.  The Vocabulary Problem in Human-System Communication: an Analysis and a Solution , 1987 .

[27]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[28]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[29]  Weiguo Fan,et al.  Effective information retrieval using genetic algorithms based matching functions adaptation , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[30]  Weiguo Fan,et al.  Discovery of context-specific ranking functions for effective information retrieval using genetic programming , 2004, IEEE Transactions on Knowledge and Data Engineering.

[31]  Weiguo Fan,et al.  A generic ranking function discovery framework by genetic programming for information retrieval , 2004, Inf. Process. Manag..

[32]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[33]  Michael J. Pazzani,et al.  Learning and Revising User Profiles: The Identification of Interesting Web Sites , 1997, Machine Learning.