CRITICAL QUESTIONS FOR BIG DATA APPROACH IN KNOWLEDGE REPRESENTATION AND ORGANIZATION

We live in the age of big data, wherein production and analysis of the massive amounts of data in relation to the various interactions of humans, objects and technologies have become a new everyday common. It comes with no surprise that knowledge organization community has also embraced the data-driven inquiry to advance representation, organization and discovery of the knowledge.  In particular, semantic technologies allowed to connect knowledge across institutions, platforms and cultures, bringing a new dimension to representation and organization of knowledge. This paper presents analysis of the knowledge organization research that employed a large-scale or big data analysis techniques to find what are methodologies, research questions, and implications of big data approaches are. Analysis of over 500 scholarly works indexed in Library and Information Science Full text and Google Scholar databases suggests advantages of a large-scale data integration approaches. For instance, Baca and Gill (2015) paper presents how semantic technologies have allowed multilingual and cross-cultural representation of Getty Art & Architecture Thesaurus (AAT), the Getty Thesaurus of Geographic Names (TGN) and the Union List of Artist Names (ULAN).  Mayr and Zeng (2017) argue that the semantic web standards, such as SKOS, OWL, RDFS, and SPARQL allowes to publish knowledge organization systems (KOS) as Linked Open Data (LOD). Mayr and Zeng proposes the following outcomes of LOD application: transformation of KOS vocabulary into the lightweight OWL ontologies or SKOS vocabulary datasets, and accessibility of the data by means of SPARQL endpoints. However, the data-driven knowledge organization initiatives raise significant questions on whether data-driven access to the knowledge would facilitate and/or transform the use and accessibility of the knowledge organization systems, whether it would help us to understand humans’ knowledge representation, organization and discovery behavior, or whether it would usher new forms of biases, limitations and privacy incursions. A large corpus of knowledge representation and organization research have discussed various biases of knowledge organization systems, such as representation of marginalized and indigenous populations. For instance, indigenous scholars have demonstrated lack of understanding of indigenous epistemologies in representation of indigenous cultures that resulted in limited and partial representation of indigenous knowledge (Doyle, 2006; Metoyer & Doyle, 2015) . Moreover, algorithmic biases that are built-in in platforms and systems, such as Google search engine, are another major concern when it comes to such issues as utilization of user-generated content to complement traditional representation of resources.  The data-driven approach also raises ethical issues related to incorporation of user-generated content without users’ consent.  In this regard, Ibekwe-SanJuan and Bowker (2017) confront the relevance of universal bibliographic classification and thesaurus, arguing that big data will not remove the need for human constructed systems. Authors also suggest a shift from purely universalist and top-down approach to more descriptive bottom-up approaches that could potentially include diverse viewpoints. Taking into consideration the complexity of the process of representation of knowledge, we argue that data-drive approach would have little to no effect on eliminating limitations and biases of existing knowledge organization and discovery systems. This study suggests that it is necessary to critically interrogate the advantages of big data approach to knowledge representation and organization to spark conversations about the cultural, technological, scholarly, societal and ethical implications of data driven approach to the knowledge representation, organization and discovery. This study argues that while a data-driven approach would certainly be valuable in provision of a large-scale representation of knowledge, only human- and community- centered approaches to knowledge representation and organization would enhance and ensure multifaceted and rich representation of the knowledge. References Baca, M., & Gill, M. (2015). Encoding multilingual knowledge systems in the digital age: The Getty vocabularies. The fifth North American Symposium on Knowledge Organization ( NASKO 2015), June 18-19, 2015, Los Angeles, California. Retrieved from http://www.iskocus.org/NASKO2015proceedings/Gill%20.pdf Doyle, A. M. (2006). Naming and reclaiming indigenous knowledge: Intersections of landscape and experience. In G. Budin, C. Swertz & K.Mitgutsch (Eds.) Advances in knowledge organization (10), Knowledge Organization for a Global Learning Society: Proceedings of the Ninth International ISKO Conference in Vienna, Austria, 2006, Ergon Verlag, Wurzburg,  pp. 435-442. Ibekwe-SanJuan, F., & Bowker, G.C. (2017). Implications of big data for knowledge organization. Knowledge Organization, 44 (3) , 187-198. Mayr, P., & Zeng, M. (2017). Knowledge organization systems in the semantic web.  International Society for Knowledge Organization (ISKO), UK Conference 2017, September 11-12. 2017, London, UK.  Retrieved from http://www.iskouk.org/content/knowledge-organization-systems-kos-semantic-web Metoyer, C. A., & Doyle, A.M. (2015). Introduction. Cataloging & Classification Quarterly,53 (5-6), 475-478.