论文信息 - Clustering Heterogeneous Semi-structured Social Science Datasets

Clustering Heterogeneous Semi-structured Social Science Datasets

Abstract Social scientists have begun to collect large datasets that are heterogeneous and semi-structured, but the ability to analyze such data has lagged behind its collection. We design a process to map such datasets to a numerical form, apply singular value decomposition clustering, and explore the impact of individual attributes or fields by overlaying visualizations of the clusters. This provides a new path for understanding such datasets, which we illustrate with three real-world examples: the Global Terrorism Database, details of every terrorist attack since 1970; a Chicago police dataset, details of every drug-related incident over a period of approximately a month; and a dataset describing members of a Hezbollah crime/terror network within the U.S.

Christian Leuprecht | David B. Skillicorn

[1] Usman Qamar,et al. Attack Type Prediction Using Hybrid Classifier , 2014, ADMA.

[2] William Ribarsky,et al. Visual analysis of entity relationships in the Global Terrorism Database , 2008, SPIE Defense + Commercial Sensing.

[3] Gary LaFree,et al. The Global Terrorism Database: Accomplishments and Challenges , 2010 .

[4] Walter Enders,et al. Domestic versus transnational terrorism: Data, decomposition, and dynamics , 2011 .

[5] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.