Reddit-TUDFE: practical tool to explore Reddit usability in data science and knowledge processing

ABSTRACT This contribution argues that Reddit, as a massive, categorized, open-access dataset, can be used to conduct knowledge capture on “almost any topic”. Presented analysis, is based on 180 manually annotated papers related to Reddit and data acquired from top databases of scientific papers. Moreover, an open source tool is introduced, which provides easy access to Reddit resources, and exploratory data analysis of how Reddit covers selected topics.

[1]  Huijun Zhang,et al.  Building and Using Personal Knowledge Graph to Improve Suicidal Ideation Detection on Social Media , 2020, IEEE Transactions on Multimedia.

[2]  Alexander V. Mantzaris,et al.  Deep Agent: Studying the Dynamics of Information Spread and Evolution in Social Networks , 2020, Proceedings of the 2019 International Conference of The Computational Social Science Society of the Americas.

[3]  M. Maroney,et al.  The meaning of scientific objectivity and subjectivity: From the perspective of methodologists. , 2020, Psychological methods.

[4]  Alexander V. Mantzaris,et al.  Controversial information spreads faster and further than non-controversial information in Reddit , 2020, Journal of Computational Social Science.

[5]  G. Viglione How scientific conferences will survive the coronavirus shock , 2020, Nature.

[6]  A. Azzouz 2011 , 2020, City.

[7]  Jeremy Blackburn,et al.  The Pushshift Reddit Dataset , 2020, ICWSM.

[8]  Zhiyuan Liu,et al.  Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs , 2019, ACL.

[9]  Jean-Charles Delvenne,et al.  The anatomy of Reddit: An overview of academic research , 2017, Dynamics On and Of Complex Networks III.

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  J. Perkel,et al.  Why Jupyter is data scientists’ computational notebook of choice , 2018, Nature.

[12]  F. J. Cantu-Ortiz Research Analytics : Boosting University Productivity and Competitiveness through Scientometrics , 2017 .

[13]  Christine L. Borgman,et al.  Using the Jupyter Notebook as a Tool for Open Science: An Empirical Study , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[14]  Vishal. A. Kharde,et al.  Sentiment Analysis of Twitter Data : A Survey of Techniques , 2016, ArXiv.

[15]  Justin P Peters,et al.  Scholarish: Google Scholar and its Value to the Sciences , 2012, Issues in Science and Technology Librarianship.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Till Mossakowski,et al.  How to model the shapes of molecules? Combining topology and ontology using heterogeneous specifications , 2011 .

[18]  Mary Shultz,et al.  Comparing test searches in PubMed and Google Scholar. , 2007, Journal of the Medical Library Association : JMLA.

[19]  P. Jacsó Google Scholar: the pros and the cons , 2005, Online Inf. Rev..

[20]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[21]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..