Identifying Maturity Rating Levels of Online Books

With the huge amount of books available nowadays, it is a challenge to determine appropriate reading materials that are suitable for a reader, especially books that match the maturity levels of children and adolescents. Analyzing the age-appropriateness for books can be a time-consuming process, since it can take up to three hours for a human to read a book, and the relatively low cost of creating literary content can cause it to be even more difficult to discover age-suitable materials to read. In order to solve this problem, we propose a maturity-rating-level detection tool based on neural network models. The proposed model predicts a book’s content rating level within each of the seven categories: (i) crude humor/language; (ii) drug, alcohol, and tobacco use; (iii) kissing; (iv) profanity; (v) nudity; (vi) sex and intimacy; and (vii) violence and horror, given the text of the book. The empirical study demonstrates that mature content of online books can be accurately predicted by computers through the use of natural language processing and machine learning techniques. Experimental results also verify the merit of the proposed model that outperforms a number of baseline models and well-known, existing maturity ratings prediction tools.

[1]  M. Bowie Media violence. , 1997, South African medical journal = Suid-Afrikaanse tydskrif vir geneeskunde.

[2]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[3]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[4]  Eduardo Fidalgo,et al.  Use of Natural Language Processing to Identify Inappropriate Content in Text , 2019, HAIS.

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Hongxia Jin,et al.  Protecting Your Children from Inappropriate Content in Mobile Apps: An Automatic Maturity Rating Framework , 2015, CIKM.

[7]  Sarah M. Coyne,et al.  Profanity in Media Associated With Attitudes and Behavior Regarding Profanity Use and Aggression , 2011, Pediatrics.

[8]  Nikhil R. Pal,et al.  A neuro-fuzzy scheme for simultaneous feature selection and fuzzy rule-based classification , 2004, IEEE Transactions on Neural Networks.

[9]  Anthony G. Cohn,et al.  An Effective Approach for Imbalanced Classification: Unevenly Balanced Bagging , 2013, AAAI.

[10]  Stan Matwin,et al.  Offensive Language Detection Using Multi-level Classification , 2010, Canadian Conference on AI.

[11]  Manoj Kumar Chinnakotla,et al.  Deep learning for detecting inappropriate content in text , 2018, International Journal of Data Science and Analytics.

[12]  Ying Chen,et al.  Is this app safe for children?: a comparison study of maturity ratings on Android and iOS applications , 2013, WWW '13.

[13]  Daniel A. Keim,et al.  Are my Children Old Enough to Read these Books? Age Suitability Analysis , 2011, Polibits.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Matthias Schroder,et al.  Logistic Regression: A Self-Learning Text , 2003 .

[16]  M. Fishbein,et al.  It Works Both Ways: The Relationship Between Exposure to Sexual Content in the Media and Adolescent Sexual Behavior , 2008, Media psychology.

[17]  A. Adachi-Mejia,et al.  Longitudinal Study of Viewing Smoking in Movies and Initiation of Smoking by Children , 2008, Pediatrics.

[18]  D. Gentile,et al.  Media violence, physical aggression, and relational aggression in school age children: a short-term longitudinal study. , 2011, Aggressive behavior.

[19]  T. Heatherton,et al.  Comparing the effects of entertainment media and tobacco marketing on youth smoking , 2008, Tobacco Control.

[20]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[21]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.