Decision Algorithm for the Automatic Determination of the Use of Non-Inclusive Terms in Academic Texts

The use of inclusive language, among many other gender equality initiatives in society, has garnered great attention in recent years. Gender equality offices in universities and public administration cannot cope with the task of manually checking the use of non-inclusive language in the documentation that those institutions generate. In this research, an automated solution for the detection of non-inclusive uses of the Spanish language in doctoral theses generated in Spanish universities is introduced using machine learning techniques. A large dataset has been used to train, validate, and analyze the use of inclusive language; the result is an algorithm that detects, within any Spanish text document, non-inclusive uses of the language with error, false positive, and false negative ratios slightly over 10%, and precision, recall, and F-measure percentages over 86%. Results also show the evolution with time of the ratio of non-inclusive usages per document, having a pronounced reduction in the last years under study.

[1]  A. Cislak,et al.  Side effects of gender‐fair language: How feminine job titles influence the evaluation of female applicants , 2013 .

[2]  Shasha Wang,et al.  Deep feature weighting for naive Bayes and its application to text classification , 2016, Eng. Appl. Artif. Intell..

[3]  Vadlamani Ravi,et al.  A survey of the applications of text mining in financial domain , 2016, Knowl. Based Syst..

[4]  Madhushi D. Welikala,et al.  Identifying Racist Social Media Comments in Sinhala Language Using Text Analytics Models with Machine Learning , 2018, 2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer).

[5]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[6]  Ying Wah Teh,et al.  Text mining for market prediction: A systematic review , 2014, Expert Syst. Appl..

[7]  Mohamed M. Mostafa,et al.  More than words: Social networks' text mining for consumer brand sentiments , 2013, Expert Syst. Appl..

[8]  S. Brindha,et al.  A survey on classification techniques for text mining , 2016, 2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS).

[9]  Peter Hegarty,et al.  Methodologies of misgendering: Recommendations for reducing cisgenderism in psychological research , 2014 .

[10]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[11]  Casey S. Greene,et al.  Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery , 2015, Briefings Bioinform..

[12]  Iñaki Inza,et al.  Dealing with the evaluation of supervised classification algorithms , 2015, Artificial Intelligence Review.

[13]  Jane Stout,et al.  When He Doesn’t Mean You: Gender-Exclusive Language as Ostracism , 2011, Personality & social psychology bulletin.

[14]  C. Fernandez,et al.  Sexismo lingüístico : análisis y propuestas ante la discriminación sexual en el lenguaje , 1999 .

[15]  Shasha Wang,et al.  Adapting naive Bayes tree for text classification , 2015, Knowledge and Information Systems.

[16]  Jie Wang,et al.  Research on text classification based on SVM-KNN , 2014, 2014 IEEE 5th International Conference on Software Engineering and Service Science.

[17]  Divakar Singh,et al.  A SURVEY REPORT ON TEXT CLASSIFICATION WITH DIFFERENT TERM WEIGHING METHODS AND COMPARISON BETWEEN CLASSIFICATION ALGORITHMS , 2013 .

[18]  Carmen Pérez-Sabater Research on Sexist Language in EFL Literature: Towards a Non-Sexist Approach , 2015 .

[19]  Michael A. Shepherd,et al.  Support vector machines for text categorization , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[20]  Lisa Kaati,et al.  Detecting Jihadist Messages on Twitter , 2015, 2015 European Intelligence and Security Informatics Conference.

[21]  Samrudhi Sharma,et al.  Comparison of Text Classification Algorithms , 2015 .

[22]  Vivek Agarwal,et al.  Survey on Classification Techniques for Data Mining , 2015 .

[23]  Jyoti Mandowara,et al.  Text Classification by Combining Text Classifiers to Improve the Efficiency of Classification , 2016 .

[24]  Bruno Trstenjak,et al.  on Intelligent Manufacturing and Automation , 2013 KNN with TF-IDF Based Framework for Text Categorization , 2014 .

[25]  Yifei Ji,et al.  Exploring the Cause of English Pronoun Gender Errors by Chinese Learners of English: Evidence from the Self-paced Reading Paradigm , 2015, Journal of psycholinguistic research.

[26]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[27]  Elisabetta Fersini,et al.  Unintended Bias in Misogyny Detection , 2019, 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[28]  Marie Gustafsson Sendén,et al.  Introducing a gender-neutral pronoun in a natural gender language: the influence of time on attitudes and behavior , 2015, Front. Psychol..

[29]  John Gastil,et al.  Generic pronouns and sexist language: The oxymoronic character of masculine generics , 1990 .

[30]  Manuel Carreiras,et al.  When Words Have Two Genders: Anaphor Resolution for Italian Functionally Ambiguous Words , 1997 .

[31]  P. Gygax,et al.  Sexism and Attitudes Toward Gender-Neutral Language: The Case of English, French, and German , 2012 .

[32]  Carla J. Groom,et al.  Gender Differences in Language Use: An Analysis of 14,000 Text Samples , 2008 .

[33]  María Márquez Guerrero Bases epistemológicas del debate sobre el sexismo lingüístico , 2016 .

[34]  Suiping Wang,et al.  The Role of Gender Information in Pronoun Resolution: Evidence from Chinese , 2012, PloS one.

[35]  D. Flannagan,et al.  Effects of Pronoun Type and Gender Role Consistency on Children's Recall and Interpretation of Stories , 2000 .

[36]  Ryan Cotterell,et al.  Examining Gender Bias in Languages with Grammatical Gender , 2019, EMNLP.

[37]  M. Hamilton Using masculine generics: Does generic he increase male bias in the user's imagery? , 1988 .

[38]  Á. G. Meseguer Es sexista la lengua española? : una investigación sobre el género gramatical , 1994 .

[39]  R. Lakoff Language and woman's place , 1973, Language in Society.

[40]  Lucy Vasserman,et al.  Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.

[41]  Riyanarto Sarno,et al.  Personality classification based on Twitter text using Naive Bayes, KNN and SVM , 2015, 2015 International Conference on Data and Software Engineering (ICoDSE).

[42]  Hui Zhang,et al.  A New SVM Method for Short Text Classification Based on Semi-Supervised Learning , 2015, 2015 4th International Conference on Advanced Information Technology and Sensor Application (AITS).

[43]  Vasudeva Varma,et al.  Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations , 2019, WWW.

[44]  Susan T. Dumais,et al.  Using SVMs for Text Categorization , 2016 .

[45]  Jennifer L. Prewitt-Freilino,et al.  The Gendering of Language: A Comparison of Gender Equality in Countries with Gendered, Natural Gender, and Genderless Languages , 2012 .

[46]  Namita Mittal,et al.  Text Classification Using Machine Learning Methods-A Survey , 2012, SocProS.

[47]  P. Eckert,et al.  Putting communities of practice in their place , 2007 .

[48]  J. Holmes,et al.  Different Voices, Different Views: An Introduction to Current Research in Language and Gender , 2008 .

[49]  Yoav Goldberg,et al.  Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[50]  B. Hannover,et al.  Changing (S)expectations: How gender fair job descriptions impact children's perceptions and interest regarding traditionally male occupations ☆ , 2013 .

[51]  Krys J. Kochut,et al.  A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques , 2017, ArXiv.

[52]  Aïcha Mokhtari,et al.  Combining supervised term-weighting metrics for SVM text classification with extended term representation , 2016, Knowledge and Information Systems.

[53]  Stan Matwin,et al.  Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs , 2018, ALW.

[54]  Heri Ramampiaro,et al.  Effective hate-speech detection in Twitter data using recurrent neural networks , 2018, Applied Intelligence.

[55]  Allyson J. Weseley,et al.  ¿Qué? Quoi? Do Languages with Grammatical Gender Promote Sexist Attitudes? , 2009 .

[56]  P. Gygax,et al.  Fostering the generic interpretation of grammatically masculine forms: When my aunt could be one of the mechanics , 2014 .

[57]  Jenn-Yeu Chen,et al.  Differential Sensitivity to the Gender of a Person by English and Chinese Speakers , 2011, Journal of psycholinguistic research.

[58]  D. Cameron On Language And Sexual Politics , 2006 .

[59]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[60]  Ruchika Malhotra,et al.  Techniques for text classification: Literature review and current trends , 2015, Webology.

[61]  Morton Ann Gernsbacher,et al.  In Search of Gender Neutrality: Is Singular They a Cognitively Efficient Substitute for Generic He? , 1997, Psychological science.