A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project

This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer to as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. In order to address the scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we conducted experiments using the concept space approach on parallel supercomputers. Our test collection included computer science and electrical engineering abstracts extracted from the INSPEC database. The concept space approach called for extensive textual and statistical analysis (a form of knowledge discovery) based on automatic indexing and co-occurrence analysis algorithms, both previously tested in the biology domain. Initial testing results using a 512-node CM-5 and a 16-processor SGI Power Challenge were promising.

[1]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[2]  Hsinchun Chen,et al.  Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[3]  K. J. Lynch,et al.  Automatic construction of networks of concepts characterizing document databases , 1992, IEEE Trans. Syst. Man Cybern..

[4]  Robert N. Oddy,et al.  Pthomas: An adaptive information retrieval system on the connection machine , 1991, Inf. Process. Manag..

[5]  Gerard Salton,et al.  Parallel text search methods , 1988, CACM.

[6]  Peter Willett,et al.  The limitations of term co-occurrence data for query expansion in document retrieval systems , 1991, J. Am. Soc. Inf. Sci..

[7]  Hsinchun Chen,et al.  An algorithmic approach to concept exploration in a large knowledge network (automatic thesaurus consultation): symbolic branch-and-bound search vs. connectionist Hopfield net activation , 1995 .

[8]  Martha W. Evens,et al.  Generating a Relational Lexicon from a Machine–Readable Dictionary* , 1988 .

[9]  H. Chen,et al.  An Algorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Branch-and-Bound Search vs. Connectionist Hopfield Net Activation , 1995, J. Am. Soc. Inf. Sci..

[10]  Hsinchun Chen,et al.  Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms , 1995, J. Am. Soc. Inf. Sci..

[11]  Hsinchun Chen,et al.  A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System , 1997, J. Am. Soc. Inf. Sci..

[12]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI) and TREC-2 , 1993, TREC.

[13]  Michael Lesk,et al.  Word-word associations in document retrieval systems , 1969 .

[14]  Hsinchun Chen,et al.  Reducing Indeterminism in Consultation: A Cognitive Model of User/Librarian Interactions , 1987, AAAI.

[15]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part I. Background and Theory , 1997, J. Documentation.

[16]  Hsinchun Chen,et al.  Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[17]  W. Bruce Croft,et al.  Experiments with query acquisition and use in document retrieval systems , 1989, SIGIR '90.

[18]  Lauren B. Doyle,et al.  Indexing and abstracting by association , 1962 .

[19]  Carolyn J. Crouch,et al.  Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[20]  Hava T. Siegelmann,et al.  On the allocation of documents in multiprocessor information retrieval systems , 1991, SIGIR '91.

[21]  Allen Newell,et al.  The psychology of human-computer interaction , 1983 .

[22]  Mukesh Singhal,et al.  An Analysis of Performance and Cost Factors in Searching Large Text Databases Using Parallel Search Systems , 1994, Journal of the American Society for Information Science.

[23]  Marcia J. Bates,et al.  Subject access in online catalogs: A design model , 1986, J. Am. Soc. Inf. Sci..

[24]  Edie M. Rasmussen,et al.  Introduction: Parallel processing and information retrieval , 1991, Inf. Process. Manag..

[25]  Edward A. Fox,et al.  Building a Large Thesaurus for Information Retrieval , 1988, ANLP.

[26]  Craig Stanfill,et al.  Information retrieval on the connection machine: 1 to 8192 gigabytes , 1991, Inf. Process. Manag..

[27]  B R Schatz,et al.  NCSA Mosaic and the World Wide Web: Global Hypermedia Protocols for the Internet , 1994, Science.

[28]  R Pool Off-the-Shelf Chips Conquer The Heights of Computing , 1995, Science.

[29]  Salvatore J. Stolfo,et al.  Report on Workshop on High Performance Computing and Communications for Grand Challenge Applications: Computer Vision, Speech and Natural Language Processing, and Artificial Intelligence , 1993, IEEE Trans. Knowl. Data Eng..

[30]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[31]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[32]  Jin H. Kim,et al.  A Model of Knowledge Based Information Retrieval with Hierarchical Concept Graph , 1990, J. Documentation.

[33]  Gerald Salton,et al.  Automatic text processing , 1988 .

[34]  Betsy L. Humphreys,et al.  The UMLS Knowledge Sources: Tools for Building Better User Interfaces. , 1990 .

[35]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Edie M. Rasmussen,et al.  Clustering Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[37]  K. J. Lynch,et al.  Generating, integrating, and activating thesauri for concept-based document retrieval , 1993, IEEE Expert.

[38]  H. Ellis ms , 1998, The Lancet.

[39]  F. W. Lancaster,et al.  Vocabulary control for information retrieval , 1972 .

[40]  H. Edmund Stiles,et al.  The Association Factor in Information Retrieval , 1961, JACM.

[41]  Peter Willett,et al.  The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems , 1991 .

[42]  Hsinchun Chen,et al.  Cognitive process as a basis for intelligent retrieval systems design , 1991, Inf. Process. Manag..

[43]  Clark D. Thomborson,et al.  Does your workstation computation belong on a vector supercomputer? , 1993, CACM.

[44]  John R. Anderson Cognitive Psychology and Its Implications , 1980 .