Data‐driven approaches to information access

This paper summarizes three lines of research that are motivated by the practical problem of helping users find information from external data sources, most notably computers. The application areas include information retrieval, text categorization, and question answering. A common theme in these applications is that practical information access problems can be solved by analyzing the statistical properties of words in large volumes of real world texts. The same statistical properties constrain human performance, thus we believe that solutions to practical information access problems can shed light on human knowledge representation and reasoning.

[1]  Thomas K. Landauer,et al.  On the computational basis of learning and cognition: Arguments from LSA , 2002 .

[2]  J. Deese The structure of associations in language and thought , 1966 .

[3]  Margaret G. McKeown,et al.  The Contribution of Prior Knowledge and Coherent Text to Comprehension , 1992 .

[4]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[5]  A. Graesser,et al.  The Psychology of Questions , 1985 .

[6]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[7]  Nello Cristianini,et al.  Latent Semantic Kernels , 2001, Journal of Intelligent Information Systems.

[8]  Elizabeth D. Liddy,et al.  Categorization and Standardizing Proper Nouns for Efficient Information Retrieval , 1996 .

[9]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[10]  Peter W. Foltz,et al.  Learning from text: Matching readers and texts by latent semantic analysis , 1998 .

[11]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[12]  W. Kintsch,et al.  Are Good Texts Always Better? Interactions of Text Coherence, Background Knowledge, and Levels of Understanding in Learning From Text , 1996 .

[13]  Susan T. Dumais,et al.  Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing , 1998 .

[14]  W. Kintsch,et al.  Time course of priming for associate and inference words in a discourse context , 1988, Memory & cognition.

[15]  Dragomir R. Radev,et al.  Question-answering by predictive annotation , 2000, SIGIR '00.

[16]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[17]  Donna Harman,et al.  How effective is suffixing , 1991 .

[18]  Robert L. Goldstone,et al.  Concepts and Categorization , 2003 .

[19]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[20]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[21]  James Pustejovsky,et al.  Corpus processing for lexical acquisition , 1996 .

[22]  Philip J. Hayes,et al.  TCS: a shell for content-based text categorization , 1990, Sixth Conference on Artificial Intelligence for Applications.

[23]  E. B. Page Computer Grading of Student Prose, Using Modern Concepts and Software , 1994 .

[24]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[25]  Peter W. Foltz,et al.  The intelligent essay assessor: Applications to educational technology , 1999 .

[26]  Peter W. Foltz,et al.  Reasoning from Multiple Texts: An Automatic Analysis of Readers? Situation Models , 1996 .

[27]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[28]  Thomas L. Griffiths,et al.  A probabilistic approach to semantic representation , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[29]  Marcia J. Bates,et al.  Subject access in online catalogs: A design model , 1986 .

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[31]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[32]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[33]  Darrell Laham,et al.  Latent Semantic Analysis Approaches to Categorization , 1997 .

[34]  John R. Anderson,et al.  A rational analysis of human memory. , 1989 .

[35]  Sanda M. Harabagiu,et al.  High performance question/answering , 2001, SIGIR '01.

[36]  John R. Anderson,et al.  Reflections of the Environment in Memory Form of the Memory Functions , 2022 .

[37]  David D. Lewis,et al.  Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing Tasks , 2001, TREC.

[38]  Susan T. Dumais,et al.  Optimizing search by showing results in context , 2001, CHI.

[39]  Leon Flicker,et al.  Latent Semantic Analysis: A New Method to Measure Prose Recall , 2002, Journal of clinical and experimental neuropsychology.

[40]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[41]  Amnon Rapoport,et al.  Structures in the subjective lexicon , 1971 .

[42]  C. Osgood,et al.  The Measurement of Meaning , 1958 .

[43]  Shoshana Loeb,et al.  Architecting personalized delivery of multimedia information , 1992, CACM.

[44]  Krishna Bharat,et al.  The Term Vector Database: fast access to indexing terms for Web pages , 2000, Comput. Networks.

[45]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[46]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[47]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[48]  Yiming Yang,et al.  A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.

[49]  Peter W. Foltz,et al.  Learning Human-like Knowledge by Singular Value Decomposition: A Progress Report , 1997, NIPS.

[50]  Susan T. Dumais,et al.  The Vocabulary Problem in Human-Computer Interaction , 1987 .

[51]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[52]  Eduard H. Hovy,et al.  Question Answering in Webclopedia , 2000, TREC.

[54]  C. Burgess,et al.  Semantic and associative priming in the cerebral hemispheres: Some words do, some words don't … sometimes, some places , 1990, Brain and Language.

[55]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[56]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[57]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[58]  Susan T. Dumais,et al.  How come you know so much? From practical problem to theory , 1996 .

[59]  Daniel Yarlett,et al.  Semantic grounding in models of analogy: an environmental approach , 2003 .

[60]  Charles L. A. Clarke,et al.  Exploiting redundancy in question answering , 2001, SIGIR '01.

[61]  B. K. Britton,et al.  Using Kintsch's computational model to improve instructional text: Effects of repairing inference calls on recall and cognitive structures. , 1991 .

[62]  G. Cohen,et al.  Memory for proper names: a review. , 1993, Memory.

[63]  Stephen K. Reed,et al.  Pattern recognition and categorization , 1972 .

[64]  Jimmy J. Lin,et al.  Web question answering: is more always better? , 2002, SIGIR '02.

[65]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[66]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[67]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[68]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[69]  F. W. Lancaster,et al.  Vocabulary control for information retrieval , 1972 .

[70]  Susan T. Dumais,et al.  Using LSI for information filtering: TREC-3 experiments , 1995 .

[71]  J A Swets,et al.  Information Retrieval Systems. , 1963, Science.

[72]  Brian H. Ross,et al.  Food for Thought: Cross-Classification and Category Organization in a Complex Real-World Domain , 1999, Cognitive Psychology.

[73]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[74]  Haym Hirsh,et al.  Using LSI for text classification in the presence of background text , 2001, CIKM '01.

[75]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[76]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[77]  WILLIAM P. JONES,et al.  On the Applied Use of Human Memory Models: The Memory Extender Personal Filing System , 1986, Int. J. Man Mach. Stud..

[78]  Susan T. Dumais,et al.  Bringing order to the Web: automatically categorizing search results , 2000, CHI.

[79]  Steve Renals,et al.  Proceedings of the Ninth Text REtrieval Conference , 2001 .

[80]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[81]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[82]  V Bruce,et al.  Naming faces and naming names: exploring an interactive activation model of person recognition. , 1993, Memory.

[83]  Peter W. Foltz,et al.  Latent semantic analysis for text-based research , 1996 .

[84]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[85]  Leah S. Larkey,et al.  Automatic essay grading using text categorization techniques , 1998, SIGIR '98.

[86]  Bob Rehder,et al.  How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans , 1997 .

[87]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[88]  Curt Burgess,et al.  From simple associations to the building blocks of language: Modeling meaning in memory with the HAL model , 1998 .

[89]  Prabhakar Raghavan,et al.  Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies , 1998, The VLDB Journal.

[90]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[91]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .