Attention-Based Document Classifier Learning

We describe an approach for creating precise personalized document classifiers based on the user's attention. The general idea is to observe which parts of a document the user was interested in just before he or she comes to a classification decision. Having information about this manual classification decision and the document parts the decision was based on, we can learn precise classifiers. For observing the user's focus point of attention we use an unobtrusive eye tracking device and apply an algorithm for reading behavior detection. On this basis, we can extract terms characterizing the text parts interesting to the user and employ them for describing the class the document was assigned to by the user. Having learned classifiers in that way, new documents can be classified automatically using techniques of passage-based retrieval. We prove the very strong improvement of incorporating the user's visual attention by a case study that evaluates an attention-based term extraction method.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Andreas Dengel,et al.  Generating and using gaze-based document annotations , 2008, CHI Extended Abstracts.

[3]  Andreas Abecker,et al.  TEXT CATEGORIZATION USING LEARNED DOCUMENT FEATURES , 2002 .

[4]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[5]  Markus Junker,et al.  Passage-Based Document Retrieval as a Tool for Text Mining with User's Information Needs , 2001, Discovery Science.

[6]  Patrick Gallinari,et al.  HMM-based passage models for document classification and ranking , 2001 .

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[9]  Malte Kiesel Kaukolu: Hub of the Semantic Corporate Intranet , 2006, SemWiki.

[10]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[11]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[12]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[13]  Andreas Dengel,et al.  Eye movements as implicit relevance feedback , 2008, CHI Extended Abstracts.

[14]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[15]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[16]  Thomas M. Breuel,et al.  The OCRopus open source OCR system , 2008, Electronic Imaging.