AmritaNLP@PAN-RusProfiling : Author Profiling using Machine Learning Techniques

This paper illustrates work done on "Gender Identi cation in Russian texts (RusPro ling)" shared task, hosted by PAN in conjunction with FIRE 2017. The task is to predict the author’s gender, based on the Twitter data corpus which is in Russian. We will give a brief introduction to the task at hand, elaborate on the data-set provided by the competition organizers, discuss various feature selection methods, provided experimental analysis that we followed for feature representation and show comparative outcomes of di erent classi ers that we used for validation. We submitted a total of 3 models and their respective prediction for each test data-set with slightly di erent pre-processing technique based upon the test corpus content. As each of the test corpus were sourced from various platforms, this made it challenging to stick to one representation alone. As per the global ranking published for the shared task[6] our team secured 2nd position overall (Concatenating all Data-set) and our 3rd submission model performed the best among the 3 submission models from the overall test data corpus. Further under extended work we discuss in brief how hyper parameter tuning of certain attributes extend our validation accuracy by 6% from baseline.