A Study of Gender Discussions in Mobile Apps

Mobile software apps ("apps") are one of the prevailing digital technologies that our modern life heavily depends on. A key issue in the development of apps is how to design gender-inclusive apps. Apps that do not consider gender inclusion, diversity, and equality in their design can create barriers (e.g., excluding some of the users because of their gender) for their diverse users. While there have been some efforts to develop gender-inclusive apps, a lack of deep understanding regarding user perspectives on gender may prevent app developers and owners from identifying issues related to gender and proposing solutions for improvement. Users express many different opinions about apps in their reviews, from sharing their experiences, and reporting bugs, to requesting new features. In this study, we aim at unpacking gender discussions about apps from the user perspective by analysing app reviews. We first develop and evaluate several Machine Learning (ML) and Deep Learning (DL) classifiers that automatically detect gender reviews (i.e., reviews that contain discussions about gender). We apply our ML and DL classifiers on a manually constructed dataset of 1,440 app reviews from the Google App Store, composing 620 gender reviews and 820 non-gender reviews. Our best classifier achieves an F1-score of 90.77%. Second, our qualitative analysis of a randomly selected 388 out of 620 gender reviews shows that gender discussions in app reviews revolve around six topics: App Features, Appearance, Content, Company Policy and Censorship, Advertisement, and Community. Finally, we provide some practical implications and recommendations for developing gender-inclusive apps.

[1]  J. Araújo,et al.  GIRE: Gender-Inclusive Requirements Engineering , 2022, Data Knowl. Eng..

[2]  Humphrey O. Obie,et al.  Supporting Developers in Addressing Human-Centric Issues in Mobile Apps , 2022, IEEE Transactions on Software Engineering.

[3]  Qi Yu,et al.  Hierarchical Bayesian multi-kernel learning for integrated classification and summarization of app reviews , 2022, ESEC/SIGSOFT FSE.

[4]  Xiapu Luo,et al.  Demystifying “removed reviews” in iOS app store , 2022, ESEC/SIGSOFT FSE.

[5]  Fahime Ebrahimi,et al.  Unsupervised Summarization of Privacy Concerns in Mobile Application Reviews , 2022, ASE.

[6]  Humphrey O. Obie,et al.  Characterizing Human Aspects in Reviews of COVID-19 Apps , 2022, 2022 IEEE/ACM 9th International Conference on Mobile Software Engineering and Systems (MobileSoft).

[7]  N. Taft,et al.  Analyzing User Perspectives on Mobile App Privacy at Scale , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).

[8]  John C. Grundy,et al.  A New Approach Towards Ensuring Gender Inclusive SE Job Advertisements , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS).

[9]  Fahime Ebrahimi,et al.  Domain-Specific Analysis of Mobile App Reviews Using Keyword-Assisted Topic Models , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).

[10]  Yawen Wang,et al.  Where is Your App Frustrating Users? , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).

[11]  Humphrey O. Obie,et al.  On the Violation of Honesty in Mobile Apps: Automated Detection and Categories , 2022, 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR).

[12]  David Lo,et al.  Understanding in-app advertising issues based on large scale app review analysis , 2021, Inf. Softw. Technol..

[13]  John Grundy,et al.  A Survey on Deep Learning for Software Engineering , 2020, ACM Comput. Surv..

[14]  Marco Aurélio Gerosa,et al.  How Gender-Biased Tools Shape Newcomer Experiences in OSS Projects , 2020, IEEE Transactions on Software Engineering.

[15]  Ehsan Noei,et al.  A study of gender in user reviews on the Google Play Store , 2021, Empirical Software Engineering.

[16]  Rabe Abdalkareem,et al.  A Machine Learning Approach to Improve the Detection of CI Skip Commits , 2021, IEEE Transactions on Software Engineering.

[17]  K. Sobowale,et al.  Considerations of diversity, equity, and inclusion in mental health apps: A scoping review of evaluation frameworks. , 2021, Behaviour research and therapy.

[18]  Julio Gonzalo,et al.  Overview of EXIST 2021: sEXism Identification in Social neTworks , 2021, Proces. del Leng. Natural.

[19]  Mohamed Wiem Mkaouer,et al.  Finding the Needle in a Haystack: On the Automatic Identification of Accessibility User Reviews , 2021, CHI.

[20]  Iftekhar Ahmed,et al.  AID: An Automated Detector for Gender-Inclusivity Bugs in OSS Project Pages , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[21]  Xin Xia,et al.  How Should I Improve the UI of My App? , 2021, ACM Trans. Softw. Eng. Methodol..

[22]  Claudia Wagner,et al.  "Call me sexist, but..." : Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples , 2020, ICWSM.

[23]  Gail C. Murphy,et al.  Locating Latent Design Information in Developer Discussions: A Study on Pull Requests , 2019, IEEE Transactions on Software Engineering.

[24]  Ying Zou,et al.  Too Many User-Reviews! What Should App Developers Look at First? , 2019, IEEE Transactions on Software Engineering.

[25]  Lara Letaw,et al.  Gender Inclusivity as a Quality Requirement: Practices and Pitfalls , 2020, IEEE Software.

[26]  Inna Vogel,et al.  Fake News Spreader Detection on Twitter using Character N-Grams , 2020, CLEF.

[27]  Carl Vogel,et al.  Gender Effects in Mobile Application Development , 2020, 2020 IEEE International Conference on Human-Machine Systems (ICHMS).

[28]  Lara Letaw,et al.  Engineering Gender-Inclusivity into Software: Ten Teams' Tales from the Trenches , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[29]  Greg Van Houdt,et al.  A review on the long short-term memory model , 2020, Artificial Intelligence Review.

[30]  Shaowen Bardzell,et al.  Gender-Inclusive HCI Research and Design: A Conceptual Review , 2020, Found. Trends Hum. Comput. Interact..

[31]  Manuel Serrano,et al.  Replication package for , 2020, Artifact Digital Object Group.

[32]  Nemanja Milosevic,et al.  Pretrained Models , 2020, Introduction to Convolutional Neural Networks.

[33]  Alexander Maedche,et al.  Gender Bias in Chatbot Design , 2019, CONVERSATIONS.

[34]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[35]  Bashar Nuseibeh,et al.  Text Filtering and Ranking for Security Bug Report Prediction , 2019, IEEE Transactions on Software Engineering.

[36]  Simone Stumpf,et al.  From GenderMag to InclusiveMag: An Inclusive Design Meta-Method , 2019, 2019 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[37]  Margaret Burnett,et al.  From Gender Biases to Gender-Inclusive Design: An Empirical Investigation , 2019, CHI.

[38]  Walid Maalej,et al.  Towards understanding and detecting fake reviews in app stores , 2019, Empirical Software Engineering.

[39]  Milena Micevski,et al.  Drivers and outcomes of branded mobile app usage intention , 2019, Journal of Product & Brand Management.

[40]  D. Joel,et al.  The Future of Sex and Gender in Psychology: Five Challenges to the Gender Binary , 2019, The American psychologist.

[41]  Mark West,et al.  I'd blush if I could: closing gender divides in digital skills through education , 2019 .

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[43]  Richard K. G. Do,et al.  Convolutional neural networks: an overview and application in radiology , 2018, Insights into Imaging.

[44]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[45]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[46]  Radhika Mamidi,et al.  When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data , 2017, NLP+CSS@ACL.

[47]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[48]  Takio Kurita,et al.  Improvement of learning for CNN with ReLU activation by sparse regularization , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[49]  Emerson Murphy-Hill,et al.  Gender differences and bias in open source: pull request acceptance of women versus men , 2017, PeerJ Comput. Sci..

[50]  M. A. Harris,et al.  Identifying factors influencing consumers' intent to install mobile applications , 2016, Int. J. Inf. Manag..

[51]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[52]  Margaret M. Burnett,et al.  GenderMag: A Method for Evaluating Software's Gender Inclusiveness , 2016, Interact. Comput..

[53]  Walid Maalej,et al.  Bug report, feature request, or simply praise? On automatically classifying app reviews , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[54]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[55]  Premkumar T. Devanbu,et al.  Gender and Tenure Diversity in GitHub Teams , 2015, CHI.

[56]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[57]  A. DiCenso,et al.  The use of triangulation in qualitative research. , 2014, Oncology nursing forum.

[58]  Ning Chen,et al.  AR-miner: mining informative reviews for developers from mobile app marketplace , 2014, ICSE.

[59]  Gayna Williams,et al.  Are you sure your software is gender-neutral? , 2014, INTR.

[60]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[61]  Christos Faloutsos,et al.  Why people hate your app: making sense of user feedback in a mobile app store , 2013, KDD.

[62]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[63]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[64]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[65]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[66]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[67]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[68]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[69]  Kathleen S. Hartzel How self-efficacy and gender issues affect software adoption and use , 2003, CACM.

[70]  Simeon Keates,et al.  Inclusive Design: Design for the Whole Population , 2003 .

[71]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[72]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[73]  Magnus C. Ohlsson,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[74]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[75]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[76]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[77]  C. Brodsky The Discovery of Grounded Theory: Strategies for Qualitative Research , 1968 .