Word Division in the Transcription of Chinese Script in the Title Fields of Bibliographic Records

ABSTRACT Recently, the Library of Congress adopted the pinyin Romanization system for transcribing Chinese data in its bibliographic records. In its canonical form, pinyin aggregates Chinese “words” into single linguistic units, but pinyin entries could be constructed following either a monosyllabic or a polysyllabic pattern. Although the former is easier and less costly to implement, the latter method is potentially more beneficial for end-users, as it reduces ambiguity, and generates a much larger variety of indexable terms. The current study investigates if following the polysyllabic method improves retrieval efficiency and effectiveness in item-specific searching within online bibliographic databases. Analysis of the results revealed that aggregation of monosyllables does improve efficiency significantly (p < .05), especially during keyword searches, while effectiveness remains mainly unaffected.

[1]  Pat Ensor User practices in keyword and Boolean searching on an online public access catalog , 1992 .

[2]  William S-Y. Wang,et al.  The Chinese Language , 1973 .

[3]  Roger E. Kirk,et al.  Statistics: An Introduction , 1998 .

[4]  Ray R. Larson The decline of subject searching: long-term trends and patterns of index use in an online catalog , 1991 .

[5]  Ching Y. Suen Computational Studies of the Most Frequent Chinese Words and Sounds , 1986, World Scientific Series in Computer Science.

[6]  Martin Kurth,et al.  Controlled and Uncontrolled Vocabulary Subject Searching in an Academic Library Online Catalog. , 1991 .

[7]  Hsin-Min Wang,et al.  A spoken-access approach for Chinese text and speech information retrieval , 2000 .

[8]  Tony T. N. Hung Syntactic and semantic aspects of Chinese tone sandhi , 1992 .

[9]  Stephen E. Robertson,et al.  Application of probabilistic methods to Chinese , 1997, J. Documentation.

[10]  John Defrancis The Chinese Language: Fact and Fantasy , 1986 .

[11]  Anita Pnrunak,et al.  The SPSS guide to data analysis , 1986 .

[12]  W. S. Cooper Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems , 1968 .

[13]  W. Francis,et al.  The structure of American English. , 1958 .

[14]  A. O'Neill,et al.  The "Known" in Known-Item Searches: Empirical Support for User-Centered Design (Research Note) , 1995 .

[15]  Sandra A. Thompson,et al.  The Chinese Language Today: Features of an Emerging Standard , 1969 .

[16]  C. R. Hildreth,et al.  Intelligent Interfaces and Retrieval Methods for Subject Searching in Bibliographic Retrieval Systems , 1989 .

[17]  Monica Cahill McJunkin Precision and recall in title keyword searches , 1995 .

[18]  Hope A. Olson,et al.  Subject Analysis in Online Catalogs , 2001 .