论文信息 - An application of cluster detection to text and picture processing

An application of cluster detection to text and picture processing

Syntactic information about a corpus of linguistic or pictorial data can be discovered by analyzing the statistics of the data. Given a corpus of text, one can measure the tendencies of pairs of words to occur in common contexts, and use these measurements to define clusters of words. Applied to basic English text, this procedure yields clusters which correspond very closely to the traditional parts of speech (nouns, verbs, articles, etc.). For FORTRAN text, the clusters obtained correspond to integers, operations, etc.; for English text regarded as a sequence of letters (or of phonemes) rather than words, the vowels and the consonants are obtained as clusters. Finally, applied to the gray shades in a digitized picture, the procedure yields slice levels which appear to be useful for figure extraction.

[1] Y. Bar-Hillel. Logical Syntax and Semantics , 1954 .

[2] Geoffrey H. Ball,et al. Data analysis in the social sciences: what about the details? , 1965, AFIPS '65 (Fall, part I).

[3] Robert F. Simmons,et al. Analyzing English syntax with a pattern-learning parser , 1965, CACM.

[4] Zellig S. Harris,et al. Distributional Structure , 1954 .

[5] M L Mendelsohn,et al. THE ANALYSIS OF CELL IMAGES * , 1966, Annals of the New York Academy of Sciences.

[6] Mary Elizabeth Stevens,et al. Automatic indexing : a state-of-the art report , 1965 .

[7] L E Lipkin,et al. THE ANALYSIS, SYNTHESIS, AND DESCRIPTION OF BIOLOGICAL IMAGES , 1966, Annals of the New York Academy of Sciences.

[8] S. Chatman. Immediate Constituents and Expansion Analysis , 1955 .

[9] Russell A. Kirsch,et al. Computer Interpretation of English Text and Picture Patterns , 1964, IEEE Trans. Electron. Comput..

[10] Robert E. Longacre,et al. String Constituent Analysis , 1960 .