Semantics in the Wild

Semantics in the Wild Robert J. Glushko (glushko@ischool.berkeley.edu) School of Information, University of California, Berkeley CA 94720 Paul P. Maglio (pmaglio@almaden.ibm.com) IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120-6099 Teenie Matlock (tmatlock@ucmerced.edu) Cognitive Science Program, University of California, Merced P.O. Box 2039, Merced, CA 95344 Lawrence W. Barsalou (barsalou@emory.edu) Psychology Department, Emory University 532 Kilgo Circle, Atlanta GA 30322 Keywords: Language; categorization; semantics; interoperability; Web 2.0. tagging; computational methods, many people have begun to impose their own semantic structure on the information and processes they encounter by tagging information with their own keywords and categories and then sharing the tags broadly or even publicly (see Hammond, Hannay, Lund, & Scott, 2005). Distributed or social categorization systems include del.icio.us for bookmarking and tagging web pages, flickr for storing and sharing photos, and youtube for videos. 1 These new, rich information environments containing semantically tagged content would seem to provide a perfect opportunity for research on semantics in the wild. 2 And so we think it is time to reconsider the nature of research on semantics. Introduction Traditionally, cognitive science has focused on the mental representation of abstract and concrete concepts through laboratory experiments (e.g., Smith, Shoben, & Rips, 1974; Rosch & Mervis, 1975). Subsequent more applied research in semantics – especially in corporate settings at Bell Labs, IBM, Xerox PARC, and elsewhere – involved field studies on how people naturally describe objects and computing processes (e.g., Furnas, Landauer, Gomez, & Dumais, Because different people often use different categories and words to refer to the same things, and they use the same ones to refer to different things, library science sought to train professional indexers and cataloguers to follow precise rules with controlled vocabularies (e.g., Taylor, 2004). But library science was late to recognize the potential impact of computing technology on the vocabulary problem. The most significant innovations in information organization and retrieval emerged from the cognitive and computer sciences, including the use of embedded thesauri and ontologies in information systems and techniques for latent semantic indexing (see Dumais, 2003). Yet in today’s world of ubiquitous computing and ubiquitous information resources, we interact daily with a bewildering variety of information types, and we constantly make choices about whether and how to organize them. It is now impossible to rely on professionals to describe and catalog information resources, which proliferate exponentially as web pages, office documents, and multimedia objects that often include photos and videos from digital cameras and cell phones. And though sophisticated indexing by web search engines, such as Google, can compensate for the lack of explicit description of information resources, much of the information we encounter and use is in fact never indexed by such systems. So rather than relying on professionals or automatic Symposium Structure The symposium will include a series of four interrelated talks on semantics in the wild. The first two will focus on practical issues in technology and business, and the second two will focus on scientific issues in linguistics and psychology. The goal is to begin a new conversation in semantics research that is grounded in the problems presented by and the opportunities afforded by modern computational and business environments. We now describe each of the participants and their potential presentation topics in turn. Robert J. Glushko is an adjunct professor in the School of Information at the University of California, Berkeley. After receiving a Ph.D. in Cognitive Psychology from UC San Diego in 1979, he spent over twenty years in corporate R&D, in consulting, and as a Silicon Valley entrepreneur before returning to the university. Glushko’s interests lie in methods and tools for the design, development, and deployment of information- See http://del.icio.us/, http://www.flickr.com/, http://www.youtube.com/. With apologies to Hutchins (1996). and