Converting a Historical Architecture Encyclopedia into a Semantic Knowledge Base

Digitizing a historical document using ontologies and natural language processing techniques can transform it from arcane text to a useful knowledge base.The Handbook on Architecture (Handbuch der Architektur) was perhaps one of the most ambitious publishing projects ever. Like a 19thcentury Wikipedia, it attempted nothing less than a full account of all architectural knowledge available at the time, both past and present. It covers topics from Greek temples to contemporary hospitals and universities; from the design of individual construction elements such as window sills to large-scale town planning; from physics to design; from planning to construction. It also discusses architectural history and styles and a multitude of other topics, such as building conception, statics, and interior design.Not surprisingly, this project took longer than planned. The encyclopedia's first volume was partly published in 1880, and over the next 63 years more than 100 architects worked on what would become more than 140 individual publications with over 25,000 pages. One important insight of our work is that targeted text analysis support, already available today, can easily be integrated into common desktop tools to support users for their task at hand. While NLP techniques are far from perfect or comprehensive, they can already deliver knowledge discovery support that goes significantly beyond the currently used approach of full-text search and information retrieval.

[1]  Michel Généreux Cultural Heritage Digital Resources: From Extraction to Querying , 2007, LaTeCH@ACL 2007.

[2]  Satoko Fujisawa Automatic Creation and Enhancement of Metadata for Cultural Heritage: Metadata about Persons and Metadata for Different User Groups , 2007, Bull. IEEE Tech. Comm. Digit. Libr..

[3]  René Witte,et al.  A Self-Learning Context-Aware Lemmatizer for German , 2005, HLT.

[4]  Ralf Krestel,et al.  A Semantic Wiki Approach to Cultural Heritage Data Management , 2008 .

[5]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[6]  Jeffrey A. Rydberg-Cox Cultural Heritage Language Technologies: Building an Infrastructure for Collaborative Digital Libraries in the Humanities , 2003 .

[7]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[8]  Arthur Stutt,et al.  Engineering Knowledge in the Age of the Semantic Web , 2004, Lecture Notes in Computer Science.

[9]  Martin Doerr,et al.  The CIDOC Conceptual Reference Module: An Ontological Approach to Semantic Interoperability of Metadata , 2003, AI Mag..

[10]  Evangelia Kavakli,et al.  Cultural Heritage Information on the Semantic Web , 2004, International Conference Knowledge Engineering and Knowledge Management.

[11]  René Witte,et al.  Fuzzy Clustering for Topic Analysis and Summarization of Document Collections , 2007, Canadian Conference on AI.

[12]  Paul H. Lewis,et al.  eCHASE: Exploiting Cultural Heritage using the Semantic Web , 2005 .