Data Mining: 16th Australasian Conference, AusDM 2018, Bahrurst, NSW, Australia, November 28–30, 2018, Revised Selected Papers

Online abuse directed towards women on the social media platform such as Twitter has attracted considerable attention in recent years. An automated method to effectively identify misogynistic abuse could improve our understanding of the patterns, driving factors, and effectiveness of responses associated with abusive tweets over a sustained time period. However, training a neural network (NN) model with a small set of labelled data to detect misogynistic tweets is difficult. This is partly due to the complex nature of tweets which contain misogynistic content, and the vast number of parameters needed to be learned in a NN model. We have conducted a series of experiments to investigate how to train a NN model to detect misogynistic tweets effectively. In particular, we have customised and regularised a Convolutional Neural Network (CNN) architecture and shown that the word vectors pre-trained on a task-specific domain can be used to train a CNN model effectively when a small set of labelled data is available. A CNN model trained in this way yields an improved accuracy over the state-of-the-art models.

[1]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[2]  Steven Skiena,et al.  Implementing discrete mathematics - combinatorics and graph theory with Mathematica , 1990 .

[3]  Reza Majdzadeh,et al.  Constructing Pragmatic Socioeconomic Status Assessment Tools to Address Health Equality Challenges , 2014, International journal of preventive medicine.

[4]  Eric Tate,et al.  Social vulnerability indices: a comparative assessment using uncertainty and sensitivity analysis , 2012, Natural Hazards.

[5]  Adam Polak,et al.  Counting Triangles in Large Graphs on GPU , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[6]  Richard A. Huebner A Survey of Educational Data-Mining Research. , 2013 .

[7]  Lakhmi C. Jain,et al.  Innovations in Fuzzy Clustering - Theory and Applications , 2006, Studies in Fuzziness and Soft Computing.

[8]  Tri Nguyen,et al.  Predictive Tool for Software Team Performance , 2016, 2016 23rd Asia-Pacific Software Engineering Conference (APSEC).

[9]  Scott Murray,et al.  Interactive Data Visualization for the Web , 2013 .

[10]  Chen Wang,et al.  Clustering Coefficient Queries on Massive Dynamic Social Networks , 2010, WAIM.

[11]  Tommy R. Jensen,et al.  Graph Coloring Problems , 1994 .

[12]  Hend Suliman Al-Khalifa,et al.  Recent developments in data mining applications and techniques , 2015, 2015 Tenth International Conference on Digital Information Management (ICDIM).

[13]  Hailiang Jin,et al.  Research on Visualization Techniques in Data Mining , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Valentin Sgarciu,et al.  An Improved Bat Algorithm Driven by Support Vector Machines for Intrusion Detection , 2015, CISIS-ICEUTE.

[16]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[17]  Diana Reckien,et al.  What is in an index? Construction method, data metric, and weighting scheme determine the outcome of composite social vulnerability indices in New York City , 2018, Regional Environmental Change.

[18]  Geoffrey I. Webb,et al.  Tree Augmented Naive Bayes , 2017, Encyclopedia of Machine Learning and Data Mining.

[19]  K. Perera A Class Imbalance Learning Approach to Fraud Detection in Online Advertising , 2022 .

[20]  Alessandro Vespignani,et al.  Large scale networks fingerprinting and visualization using the k-core decomposition , 2005, NIPS.

[21]  Ao Li,et al.  Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme , 2006, BMC Bioinformatics.

[22]  Stefan Van Aelst,et al.  Tree-based prediction on incomplete data using imputation or surrogate decisions , 2015, Inf. Sci..

[23]  Tamara Munzner,et al.  Visualization Analysis and Design , 2014, A.K. Peters visualization series.

[24]  Alan Hubbard,et al.  Assessment of economic status in trauma registries: A new algorithm for generating population-specific clustering-based models of economic status for time-constrained low-resource settings , 2016, Int. J. Medical Informatics.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Wahidah Husain,et al.  A Review on Predicting Student's Performance Using Data Mining Techniques , 2015 .

[27]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the silhouette width criterion for cluster analysis , 2006, Fuzzy Sets Syst..

[28]  Rolph E. Anderson,et al.  Multivariate Data Analysis (7th ed. , 2009 .

[29]  Caslon Chua,et al.  Mining Software Engineering Team Project Work Logs to Generate Formative Assessment , 2017, 2017 24th Asia-Pacific Software Engineering Conference Workshops (APSECW).

[30]  Claudia Perlich,et al.  Learning Curves in Machine Learning , 2010, Encyclopedia of Machine Learning.

[31]  Wan-Chi Siu,et al.  Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data , 2012, Pattern Recognit..

[32]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[33]  Samuel Madden,et al.  Graph analytics using vertica relational database , 2014 .

[34]  Md Zahidul Islam,et al.  Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques , 2013, Knowl. Based Syst..

[35]  Humphrey O. Obie Data — Driven visualisations that make sense , 2017, 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).