Relevant data expansion for learning concept drift from sparsely labeled data

Keeping track of changing interests is a natural phenomenon as well as an interesting tracking problem because interests can emerge and diminish at different time frames. Being able to do so with a few feedback examples poses an even more important and challenging problem because existing concept drift learning algorithms that handle the task typically suffer from it. This work presents a new computational framework for extending incomplete labeled data stream (FEILDS), which extends the capability of existing algorithms for learning concept drift from a few labeled data. The system transforms the original input stream into a new stream that can be conveniently tracked by the existing learning algorithms. The experiment results reveal that FEILDS can significantly improve the performances of a Multiple Three-Descriptor Representation (MTDR) algorithm, Rocchio algorithm, and window-based concept drift learning algorithms when learning from a sparsely labeled data stream with respect to their performances without using FEILDS.

[1]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[2]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[3]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[4]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[5]  Katia P. Sycara,et al.  WebMate: a personal agent for browsing and searching , 1998, AGENTS '98.

[6]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[7]  Gerhard Widmer,et al.  Tracking Context Changes through Meta-Learning , 1997, Machine Learning.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[10]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[11]  Ralf Klinkenberg,et al.  Using Labeled and Unlabeled Data to Learn Drifting Concepts , 2007 .

[12]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[13]  Philip M. Long,et al.  Tracking drifting concepts by minimizing disagreements , 2004, Machine Learning.

[14]  Haym Hirsh,et al.  Improving Short-Text Classification using Unlabeled Data for Classification Problems , 2000, ICML.

[15]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[16]  Chien Chin Chen,et al.  PVA: A Self-Adaptive Personal View Agent , 2004, Journal of Intelligent Information Systems.

[17]  John Yen,et al.  An adaptive algorithm for learning changes in user interests , 1999, CIKM '99.

[18]  JefI’rty C. Schlirrlrrer Beyond incremental processing : Tracking concept drift , 1999 .

[19]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[20]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[21]  Ingrid Renz,et al.  Adaptive Information Filtering: Learning in the Presence of Concept Drifts , 1998 .

[22]  Michael J. Pazzani,et al.  A personal news agent that talks, learns and explains , 1999, AGENTS '99.

[23]  David A. Hull The TREC-7 Filtering Track: Description and Analysis , 1998, Text Retrieval Conference.

[24]  Claude Sammut,et al.  Extracting Hidden Context , 1998, Machine Learning.

[25]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[26]  Philip M. Long,et al.  Tracking Drifting Concepts By Minimizing Disagreements , 2004, Machine Learning.

[27]  Dov M. Gabbay,et al.  Handbook of logic in artificial intelligence and logic programming (Vol. 4): epistemic and temporal reasoning , 1995 .

[28]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[29]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[30]  LiuXin,et al.  Learning Approaches for Detecting and Tracking News Events , 1999 .

[31]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[32]  Marko Balabanovic,et al.  An adaptive Web page recommendation service , 1997, AGENTS '97.

[33]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[34]  John Yen,et al.  Learning user interest dynamics with a three-descriptor representation , 2001, J. Assoc. Inf. Sci. Technol..

[35]  Giorgos Zacharia,et al.  Evolving a multi-agent information filtering solution in Amalthaea , 1997, AGENTS '97.

[36]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[37]  Ralf Klinkenberg Learning Drifting Concepts with Partial User Feedback , 1999 .

[38]  Shai Ben-David,et al.  Learning Changing Concepts by Exploiting the Structure of Change , 1996, COLT '96.

[39]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[40]  John Yen,et al.  An incremental approach to building a cluster hierarchy , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..