Intelligence Chinese Document Semantic Indexing System

With the rapid growth of the Internet, how to get information from this huge information space becomes an even more important problem. In this paper, An Intelligence Chinese Document Semantic Indexing System; ICDSIS, is proposed. Some new technologies are integrated in ICDSIS to obtain good performance. ICDSIS is composed of four key procedures. A parallel, distributed and configurable Spider is used for information gather; a multi-hierarchy document classification approach combining the information gain initially processes gathered web documents; a swarm intelligence based document clustering method is used for information organization; a concept-based retrieval interface is applied for user interactive retrieval. ICDSIS is an all-sided solution for information retrieval on the Internet.

[1]  Barbara Webb,et al.  Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..

[2]  Wu Bin,et al.  CSIM: a document clustering algorithm based on swarm intelligence , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[3]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[4]  K. J. Lynch,et al.  Generating, integrating, and activating thesauri for concept-based document retrieval , 1993, IEEE Expert.

[5]  Jesfis Peral,et al.  Heuristics -- intelligent search strategies for computer problem solving , 1984 .

[6]  Reinier Post,et al.  Information Retrieval in the World-Wide Web: Making Client-Based Searching Feasible , 1994, Comput. Networks ISDN Syst..

[7]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[8]  Hsin-Chang Yang,et al.  Automatic category generation for text documents by self-organizing maps , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[9]  K. J. Lynch,et al.  Automatic construction of networks of concepts characterizing document databases , 1992, IEEE Trans. Syst. Man Cybern..

[10]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[11]  Hsinchun Chen,et al.  User Misconceptions of Information Retrieval Systems , 1988, Int. J. Man Mach. Stud..

[12]  Susan T. Dumais,et al.  Statistical semantics: analysis of the potential performance of keyword information systems , 1984 .

[13]  Zhongzhi Shi,et al.  An approach of multi-hierarchy text classification , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[14]  Teuvo Kohonen,et al.  Self-Organizing Maps, Third Edition , 2001, Springer Series in Information Sciences.

[15]  Jay F. Nunamaker,et al.  Automatic concept classification of text from electronic meetings , 1994, CACM.

[16]  Hsinchun Chen,et al.  A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system , 1997 .

[17]  Hsinchun Chen,et al.  Cognitive process as a basis for intelligent retrieval systems design , 1991, Inf. Process. Manag..

[18]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[19]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[20]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[21]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[22]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[23]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[24]  Marcia J. Bates,et al.  Subject access in online catalogs: A design model , 1986 .

[25]  Bingsheng He,et al.  A neural network model for monotone linear asymmetric variational inequalities , 2000, IEEE Trans. Neural Networks Learn. Syst..

[26]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[27]  Hsinchun Chen,et al.  An algorithmic approach to concept exploration in a large knowledge network (automatic thesaurus consultation): symbolic branch-and-bound search vs. connectionist Hopfield net activation , 1995 .

[28]  Oliver L. Lilley Evaluation of the subject catalog. Criticisms and a proposal , 1954 .

[29]  Krishna Bharat,et al.  SPHINX: A Framework for Creating Personal, Site-Specific Web Crawlers , 1998, Comput. Networks.

[30]  S. T. Dumais,et al.  Human factors and behavioral science: Statistical semantics: Analysis of the potential performance of key-word information systems , 1983, The Bell System Technical Journal.

[31]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[32]  George W. Furnas,et al.  Experience with an adaptive indexing scheme , 1985, CHI '85.

[33]  Marc Najork,et al.  Breadth-First Search Crawling Yields High-Quality Pages , 2001 .

[34]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.