A Big Data approach to gender classification in Twitter: Notebook for PAN at CLEF 2018

This paper describes a statistical approach to the task of gender classification in tweets, with a Big Data perspective in mind. Our task started developing our own implementation of Low Dimension Representation method, with the idea to add some other statistics which had not been used in the original implementation, such as skewness, kurtosis and central moments. Exploratory analysis of the new characteristics showed the importance of skewness due to the problem presents only 2 classes. Our approach will only use skewness for describing the difference in use of the language between men and women and skewness, as well, will be used to predict gender for the test dataset.