Constructing a true LCSH tree of a science and engineering collection

The Library of Congress Subject Headings (LCSH) is a subject structure used to index large library collections throughout the world. Browsing a collection through LCSH is difficult using current online tools in part because users cannot explore the structure using their existing experience navigating file hierarchies on their hard drives. This is due to inconsistencies in the LCSH structure, which does not adhere to the specific rules defining tree structures. This article proposes a method to adapt the LCSH structure to reflect a real-world collection from the domain of science and engineering. This structure is transformed into a valid tree structure using an automatic process. The analysis of the resulting LCSH tree shows a large and complex structure. The analysis of the distribution of information within the LCSH tree reveals a power law distribution where the vast majority of subjects contain few information items and a few subjects contain the vast majority of the collection. © 2012 Wiley Periodicals, Inc.

[1]  Peter W. Eklund,et al.  Concept Lattices for Information Visualization: Can Novices Read Line-Diagrams? , 2004, ICFCA.

[2]  Mary Konkel On the Road... In the Field: Response to the Library of Congress Working Group on the Future of Bibliographic Control , 2008 .

[3]  Lois Mai Chan,et al.  Revisiting the syntactical and structural analysis of Library of Congress Subject Headings for the digital environment , 2010 .

[4]  K. Goldberg,et al.  OSU libraries' use of Library of Congress subject authorities file , 1985 .

[5]  Richard P. Smiraglia The Nature of 'A Work': Implications for the Organization of Knowledge , 2001 .

[6]  Marilyn J. Smith,et al.  Creating Better Subject Access with Multiple Vocabularies: Upgrading the Subject Heading List for the Alzheimer’s Association , 1999 .

[7]  Karen S. Fischer Critical Views of LCSH, 1990–2001: The Third Bibliographic Essay , 2005 .

[8]  Rao Aluri,et al.  Library of Congress Subject Heading Patterns in OCLC Monographic Records. , 1981 .

[9]  Jun Wang,et al.  Reconstructing ddc for interactive classification , 2007, CIKM '07.

[10]  Gordon W. Paynter,et al.  Predicting Library of Congress classifications from Library of Congress subject headings , 2004, J. Assoc. Inf. Sci. Technol..

[11]  Ray R. Larson Experiments in automatic Library of Congress Classification , 1992 .

[12]  L. Egghe Power Laws in the Information Production Process: Lotkaian Informetrics , 2005 .

[13]  Carolyn O. Frost,et al.  Subject heading compatibility between LCSH and catalog files of a large research library: a suggested model for analysis , 1988 .

[14]  Pierre Tirilly,et al.  Exploiting major trends in subject hierarchies for large-scale collection visualization , 2012, Visualization and Data Analysis.

[15]  Bohdan S. Wynar,et al.  Introduction to Cataloging and Classification , 1991 .

[16]  Lois Mai Chan,et al.  Cataloging and Classification: An Introduction , 1994 .

[17]  Albert-László Barabási,et al.  Linked: The New Science of Networks , 2002 .

[18]  Uta Priss A Graphical Interface for Document Retrieval Based on Formal Concept Analysis , 2002 .

[19]  Patrick Valduriez,et al.  Evaluation of Recursive Queries Using Join Indices , 1986, Expert Database Conf..

[20]  Charles R. McClure Subject and Added Entries as Access to Information. , 1976 .

[21]  Gary Marchionini,et al.  Finding facts vs. browsing knowledge in hypertext systems , 1988, Computer.

[22]  Karen Markey Drabenstott,et al.  Failure analysis of subject searches in a test of a new design for subject access to online catalogs , 1996 .

[23]  Akrivi Katifori,et al.  Ontology visualization methods—a survey , 2007, CSUR.

[24]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[25]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[26]  Michalis Stefanidakis,et al.  Semantic Navigation on the web: the LCSH case study , 2007, MTSR.

[27]  Johann van der Merwe,et al.  A survey on peer-to-peer key management for mobile ad hoc networks , 2007, CSUR.

[28]  Yiming Yang,et al.  Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[29]  S. Shubert Critical views of LCSH―Ten years later : a bibliographic essay , 1992 .

[30]  Jock D. Mackinlay,et al.  Cone Trees: animated 3D visualizations of hierarchical information , 1991, CHI.

[31]  Lois Mai Chan,et al.  Linking folksonomy to Library of Congress subject headings: an exploratory study , 2009, J. Documentation.

[32]  F. W. Lancaster,et al.  Vocabulary control for information retrieval , 1972 .

[33]  Gert Schmeltz Pedersen A browser for bibliographic information retrieval, based on an application of lattice theory , 1993, SIGIR.