Mixed-initiative clustering

Mixed-initiative clustering is a task where a user and a machine work collaboratively to analyze a large set of documents. We hypothesize that a user and a machine can both learn better clustering models through enriched communication and interactive learning from each other. The first contribution or this thesis is providing a framework of mixed-initiative clustering. The framework consists of machine learning and teaching phases, and user learning and teaching phases connected in an interactive loop which allows bi-directional communication. The bi-directional communication languages define types of information exchanged in an interface. Coordination between the two communication languages and the adaptation capability of the machine's clustering model is the key to building a mixed-initiative clustering system. The second contribution comes from successfully building several systems using our proposed framework. Two systems are built with incrementally enriched communication languages — one enables user feedback on features for non-hierarchical clustering and the other accepts user feedback on hierarchical clustering results. This achievement validates our framework and also demonstrates the possibilities to develop machine learning algorithms to work with conceptual properties. The third contribution comes from the study of enabling real-time interactive capability in our full-fledged mixed-initiative clustering system. We provide several guidelines on practical issues that developers of mixed-initiative learning systems may encounter. The fourth contribution is the design of user studies for examining effectiveness of a mixed-initiative clustering system. We design the studies according to two scenarios, a learning scenario where a user develops a topic ontology from an unfamiliar data set, and a teaching scenario where a user knows the ontology and wants to transfer this knowledge to a machine. Results of the user studies demonstrate the mixed-initiative clustering has advantages over non-mixed-initiative approaches in terms of helping users learn an ontology as well as helping users teach a known ontology to a machine.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[4]  Tom M. Mitchell,et al.  Extracting Knowledge about Users' Activities from Raw Workstation Contents , 2006, AAAI.

[5]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[6]  Carolyn Penstein Rosé,et al.  InfoMagnets: Making Sense of Corpus Data , 2006, HLT-NAACL.

[7]  Hui Yang Human-Guided Ontology Learning , 2008 .

[8]  Thomas G. Dietterich,et al.  Interacting meaningfully with machine learning systems: Three experiments , 2009, Int. J. Hum. Comput. Stud..

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Eric Horvitz,et al.  Principles of mixed-initiative user interfaces , 1999, CHI '99.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Gheorghe Tecuci,et al.  Personal Cognitive Assistants for Military Intelligence Analysis: Mixed-Initiative Learning, Tutoring, and Problem Solving , 2005 .

[13]  Kenton O'Hara,et al.  Social Impact , 2019, Encyclopedia of Food and Agricultural Ethics.

[14]  Andruid Kerne,et al.  Generative semantic clustering in spatial hypertext , 2005, DocEng '05.

[15]  Ellen Riloff Bootstrapping for text learning tasks , 1999 .

[16]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[17]  Ben Shneiderman,et al.  Integrating statistics and visualization: case studies of gaining clarity during exploratory data analysis , 2008, CHI.

[18]  Darren Gergle,et al.  Emotion rating from short blog texts , 2008, CHI.

[19]  Desney S. Tan,et al.  CueFlik: interactive concept learning in image search , 2008, CHI.

[20]  Ray Perrault,et al.  CALO: Cognitive Assistant that Learns and Organizes , 2005 .

[21]  Li Chen,et al.  User-Involved Preference Elicitation for Product Search and Recommender Systems , 2008, AI Mag..

[22]  Philip Bille,et al.  Tree Edit Distance, Alignment Distance and Inclusion , 2003 .

[23]  Mark Dredze,et al.  Activity-Centric Email: A Machine Learning Approach , 2006, AAAI.

[24]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[25]  Pattie Maes,et al.  Agents that reduce work and information overload , 1994, CACM.

[26]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[27]  Mark Dredze,et al.  Automatically classifying emails into activities , 2006, IUI '06.

[28]  Philip S. Yu,et al.  Text Classification by Labeling Words , 2004, AAAI.

[29]  Marti A. Hearst,et al.  Scatter/gather browsing communicates the topic structure of a very large text collection , 1996, CHI.

[30]  C. Breazeal,et al.  Transparency and Socially Guided Machine Learning , 2006 .

[31]  Hema Raghavan,et al.  InterActive Feature Selection , 2005, IJCAI.

[32]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[33]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[34]  Tom M. Mitchell,et al.  Text clustering with extended user feedback , 2006, SIGIR.

[35]  Paolo Viappiani,et al.  Preferences in Interactive Systems: Technical Challenges and Case Studies , 2008, AI Mag..

[36]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[37]  Eric Eaton,et al.  Learning user preferences for sets of objects , 2006, ICML.

[38]  Gheorghe Tecuci,et al.  User-Agent Interactions in Mixed-Initiative Learning , 2001, FLAIRS Conference.

[39]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[40]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[41]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[42]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[43]  Jack Park,et al.  IRIS: Integrate. Relate. Infer. Share , 2005, Semantic Desktop Workshop.

[44]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[45]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[46]  Tom M. Mitchell,et al.  Toward Mixed-Initiative Email Clustering , 2009, AAAI Spring Symposium: Agents that Learn from Human Teachers.

[47]  Jaime G. Carbonell,et al.  Proactive learning: cost-sensitive active learning with multiple imperfect oracles , 2008, CIKM '08.

[48]  Byron Dom,et al.  An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.

[49]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[50]  Weng-Keen Wong,et al.  Integrating rich user feedback into intelligent user interfaces , 2008, IUI '08.

[51]  Tom M. Mitchell,et al.  Inferring Ongoing Activities of Workstation Users by Clustering Email , 2004, CEAS.

[52]  Abhay Harpale,et al.  Document Classification Through Interactive Supervision of Document and Term Labels , 2004, PKDD.

[53]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[54]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[55]  Gideon S. Mann,et al.  Generalized Expectation Criteria , 2007 .

[56]  Ben Shneiderman,et al.  Systematic yet flexible discovery: guiding domain experts through exploratory data analysis , 2008, IUI '08.