Updateable PAT-Tree Approach to Chinese Key PhraseExtraction using Mutual Information: A Linguistic Foundation for Knowledge Management

There has been renewed research interest in using the statistical approach to extraction of key phrases from Chinese documents because existing approaches do not allow online frequency updates after phrases have been extracted. This consequently results in inaccurate, partial extraction. In this paper, we present an updateable PAT-tree approach. In our experiment, we compared our approach with that of Lee-Feng Chien with that showed an improvement in recall from 0.19 to 0.43 and in precision from 0.52 to 0.70. This paper also reviews the requirements for a data structure that facilitates implementation of any statistical approaches to key-phrase extraction, including PATtree, PAT-array and suffix array with semi-infinite strings.

[1]  Keh-Yih Su,et al.  An Overview of Corpus-Based Statistics-Oriented (CBSO) Techniques for Natural Language Processing , 1996, ROCLING/IJCLCLP.

[2]  Lee-Feng Chien,et al.  PAT-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval , 1999, Inf. Process. Manag..

[3]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[4]  Jay F. Nunamaker,et al.  A Graphical, Self-Organizing Approach to Classifying Electronic Meeting Output , 1997, J. Am. Soc. Inf. Sci..

[5]  Kui-Lam Kwok Comparing representations in Chinese information retrieval , 1997, SIGIR '97.

[6]  WuZimin,et al.  Chinese text segmentation for text retrieval , 1993 .

[7]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[8]  Hsin-Hsi Chen,et al.  A New Hybrid Approach for Chinese-English Query Translation , 1998 .

[9]  Andrew C. Inkpen,et al.  Knowledge Management Processes and International Joint Ventures , 1998 .

[10]  Hsinchun Chen,et al.  A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System , 1997, J. Am. Soc. Inf. Sci..

[11]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[12]  I. Nonaka A Dynamic Theory of Organizational Knowledge Creation , 1994 .

[13]  Gwyneth Tseng,et al.  ACTS: An Automatic Chinese Text Segmentation System for Full Text Retrieval , 1995, J. Am. Soc. Inf. Sci..

[14]  Yuh-Min Chen,et al.  A Systematic Approach of Virtual Enterprising Through Knowledge Management Techniques , 1998 .

[15]  Hsinchun Chen,et al.  Building Large-Scale Digital Libraries - Guest Editors' Introduction , 1996, Computer.

[16]  BrillEric,et al.  Transformation-based error-driven learning and natural language processing , 1995 .

[17]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[18]  Hsinchun Chen The Illinois Digital Library Initiative Project: Federating Repositories and Semantic Research , 2001 .

[19]  Kenneth R. Boff,et al.  Knowledge maps for knowledge mining: application to R&D/technology management , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[20]  Judith Jordan,et al.  Knowledge orientations and team effectiveness , 1998 .

[21]  Daniel E. O'Leary,et al.  Enterprise Knowledge Management , 1998, Computer.

[22]  Gerald Salton,et al.  Automatic text processing , 1988 .

[23]  Hsiao-Tieh Pu,et al.  Important Issues on Chinese Information Retrieval , 1996, Int. J. Comput. Linguistics Chin. Lang. Process..

[24]  Lee-Feng Chien,et al.  PAT-tree-based keyword extraction for Chinese information retrieval , 1997, SIGIR '97.

[25]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[26]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[27]  Hsinchun Chen,et al.  An algorithmic approach to concept exploration in a large knowledge network (automatic thesaurus consultation): symbolic branch-and-bound search vs. connectionist Hopfield net activation , 1995 .

[28]  Gaston H. Gonnet,et al.  Fast text searching for regular expressions or automaton searching on tries , 1996, JACM.

[29]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[30]  H. Chen,et al.  An Algorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Branch-and-Bound Search vs. Connectionist Hopfield Net Activation , 1995, J. Am. Soc. Inf. Sci..

[31]  이필규 [서평]「Agent Sourcebook : A Complete Guide to Desktop,Internet,and Intranet Agents」 , 1998 .

[32]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[33]  Zimin Wu,et al.  Chinese Text Segmentation for Text Retrieval: Achievements and Problems , 1993, J. Am. Soc. Inf. Sci..

[34]  Marshall Ramsey,et al.  A Smart Itsy Bitsy Spider for the Web , 1998, J. Am. Soc. Inf. Sci..

[35]  Andreas Paepcke,et al.  Using Distributed Objects for Digital Library Interoperability , 1996, Computer.

[36]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[37]  Fredric C. Gey,et al.  Chinese text retrieval without using a dictionary , 1997, SIGIR '97.

[38]  Thomas H. Davenport,et al.  Book review:Working knowledge: How organizations manage what they know. Thomas H. Davenport and Laurence Prusak. Harvard Business School Press, 1998. $29.95US. ISBN 0‐87584‐655‐6 , 1998 .

[39]  Hsinchun Chen,et al.  An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[40]  G. H. Gonnet,et al.  Handbook of algorithms and data structures: in Pascal and C (2nd ed.) , 1991 .

[41]  D. Teece Research Directions for Knowledge Management , 1998 .

[42]  P. Gács,et al.  Algorithms , 1992 .