Automated text categorization of bibliographic records

LIVA (Library Information Analysis and Visualization) is a research and development project of the Swedish School of Library and Information Science in Bor̊as (SSLIS), Bibliotekscentrum Sverige AB, and BTJ, in cooperation with a number of project libraries and with funding for 2005-2007 from the Knowledge Foundation (KKS). Based on analysis of data from the project libraries and content providers – prominently Lund Public Library, Nordiska museet, SCB (Statistics Sweden), TPB (The Swedish Library of Talking Books and Braille), Southern Älvsborg Hospital Library and Dandelon, a German portal and search engine – the goal of the project is to bring competitive functionality in terms of language technology, classification research, information retrieval (IR) and information visualization to special/public library OPACs. This interaction between the above components is one of the delicate areas of study where active research is going on worldwide. As a point of departure we assume that the quality of classification has an impact both on IR results and on information searching by browsing, as enabled by information visualization. In an automated environment, classification quality, on the other hand, heavily depends on linguistic pre-processing of the input material, this being provided for example by language technology tools. One might say that the outcome of knowledge organization depends on the abilities of the linguistic and statistical tools applied.

[1]  Chris Walshaw,et al.  Journal of Graph Algorithms and Applications a Multilevel Algorithm for Force-directed Graph-drawing , 2022 .

[2]  Sándor Dominich Interaction Information Retrieval , 1994, J. Documentation.

[3]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[4]  Russell Beale,et al.  Case study. Narcissus: visualising information , 1995, Proceedings of Visualization 1995 Conference.

[5]  David Horn,et al.  The Method of Quantum Clustering , 2001, NIPS.

[6]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[7]  Usama M. Fayyad The digital physics of data mining , 2001, CACM.

[8]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[9]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[10]  Kwan Yi Challenges in automated classification using library classification schemes , 2006 .

[11]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Robert M. Losee,et al.  Parameter estimation for probabilistic document-retrieval models , 1988, J. Am. Soc. Inf. Sci..

[14]  I. Kigongo-Bukenya World Library and Information Congress: 70th IFLA General Conference and Council , 2004 .

[15]  Octavia I. Camps,et al.  Weighted Parzen Windows for Pattern Classification , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Robert R. Korfhage,et al.  Data Physics - An Unorthodox View of Data and Its Implications in Data Processors , 1978, Computer Architecture for Non-Numeric Processing.

[17]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[18]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[19]  Birger Hjørland Fundamentals of knowledge organization , 2003 .

[20]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[21]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[22]  Birger Hjørland Fundamentals of knowledge organization , 2003 .

[23]  Russell Beale,et al.  Visualising sequences of queries: a new tool for information retrieval , 1997, Proceedings. 1997 IEEE Conference on Information Visualization (Cat. No.97TB100165).

[24]  Matthew Chalmers,et al.  Bead: explorations in information visualization , 1992, SIGIR '92.

[25]  Darrell Laham,et al.  From paragraph to graph: Latent semantic analysis for information visualization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[26]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[27]  James A. Wise,et al.  The Ecological Approach to Text Visualization , 1999, J. Am. Soc. Inf. Sci..

[28]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[29]  Assaf Gottlieb,et al.  Algorithm for data clustering in pattern recognition problems based on quantum mechanics. , 2001, Physical review letters.