Collective taxonomizing: A collaborative approach to organizing document repositories

Keeping large, growing document repositories organized is a critical challenge. For example, the security failure prior to the 9/11 tragedy was partly due to the ineffectiveness of organizing documents shared among various intelligence organizations. Drawing on the success of Web 2.0 and theories from knowledge management, we argue that a shared document repository with no central organizer may benefit from collective taxonomizing: allowing community members to categorize documents with local document hierarchies and systematically coalesce those local hierarchies into a global taxonomy. Using a design science approach, we develop and evaluate a hierarchy coalescing algorithm. Empirical and analytical evaluation shows promise.

[1]  Weiguo Fan,et al.  WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System , 2008 .

[2]  Shi-Jinn Horng,et al.  Efficient Parallel Algorithms for Hierarchical Clustering on Arrays with Reconfigurable Optical Buses , 2000, J. Parallel Distributed Comput..

[3]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[4]  Chih-Ping Wei,et al.  A Clustering-Based Approach for Integrating Document-Category Hierarchies , 2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[5]  Harris Wu,et al.  Link analysis for collaborative knowledge building , 2003, HYPERTEXT '03.

[6]  Suzanne Rivard,et al.  A Keyword Classification Scheme for IS Research Literature: An Update , 1993 .

[7]  Anton Leuski,et al.  Evaluating document clustering for interactive information retrieval , 2001, CIKM '01.

[8]  Dorothea P. Simon,et al.  Expert and Novice Performance in Solving Physics Problems , 1980, Science.

[9]  Harris Wu,et al.  Document co-organization in an online knowledge community , 2004, CHI EA '04.

[10]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[11]  Ellen M. Voorhees,et al.  Implementing agglomerative hierarchic clustering algorithms for use in document retrieval , 1986, Inf. Process. Manag..

[12]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[13]  B. Everitt,et al.  Applied Multivariate Data Analysis. , 1993 .

[14]  Marti A. Hearst,et al.  Searching and browsing text collections with large category hierarchies , 1997, CHI Extended Abstracts.

[15]  Salvatore T. March,et al.  Design and natural science research on information technology , 1995, Decis. Support Syst..

[16]  Harris Wu,et al.  Collaborative classification of growing collections with evolving facets , 2007, HT '07.

[17]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[18]  Sudha Ram,et al.  Combining schema and instance information for integrating heterogeneous data sources , 2007, Data Knowl. Eng..

[19]  David R. Firth,et al.  Communications of the Association for Information Systems , 2011 .

[20]  J. Jobson Applied Multivariate Data Analysis , 1995 .

[21]  Alan R. Hevner,et al.  Design Science in Information Systems Research , 2004, MIS Q..

[22]  Pedro M. Domingos,et al.  Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[23]  John R. Anderson Cognitive Psychology and Its Implications , 1980 .

[24]  Harris Wu,et al.  From Social Tagging to Social Hierarchies: Sharing Deeper Structural Knowledge in Web 2.0 , 2009, Commun. Assoc. Inf. Syst..

[25]  I. Nonaka A Dynamic Theory of Organizational Knowledge Creation , 1994 .

[26]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[27]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[28]  Harris Wu,et al.  Collaborative structuring: organizing document repositories effectively and efficiently , 2007, CACM.

[29]  Harris Wu,et al.  Collaborative filing in a document repository , 2004, SIGIR '04.

[30]  Harris Wu,et al.  Mining web navigations for intelligence , 2006, Decis. Support Syst..

[31]  Hector Garcia-Molina,et al.  Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems , 2006 .

[32]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[33]  P. Schmitz,et al.  Inducing Ontology from Flickr Tags , 2006 .

[34]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[35]  Hyoil Han,et al.  A survey on ontology mapping , 2006, SGMD.

[36]  Dell Zhang,et al.  Web taxonomy integration using support vector machines , 2004, WWW '04.

[37]  George M. Diekhoff,et al.  Cognitive Maps as a Tool in Communicating Structural Knowledge. , 1982 .

[38]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[39]  J. Hartigan Statistical theory in clustering , 1985 .

[40]  Lynn E. Davis,et al.  Coordinating The War On Terrorism , 2004 .

[41]  Atreyi Kankanhalli,et al.  Contributing Knowledge to Electronic Knowledge Repositories: An Empirical Investigation , 2005, MIS Q..