Incorporating Dependency Trees Improve Identification of Pregnant Women on Social Media Platforms

The increasing popularity of social media lead users to share enormous information on the internet. This information has various application like, it can be used to develop models to understand or predict user behavior on social media platforms. For example, few online retailers have studied the shopping patterns to predict shopper’s pregnancy stage. Another interesting application is to use the social media platforms to analyze users’ health-related information. In this study, we developed a tree kernel-based model to classify tweets conveying pregnancy related information using this corpus. The developed pregnancy classification model achieved an accuracy of 0.847 and an F-score of 0.565. A new corpus from popular social media platform Twitter was developed for the purpose of this study. In future, we would like to improve this corpus by reducing noise such as retweets.

[1]  Alessandro Moschitti,et al.  A Study on Convolution Kernels for Shallow Statistic Parsing , 2004, ACL.

[2]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[3]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[4]  Abeed Sarker,et al.  Detecting Personal Medication Intake in Twitter: An Annotated Corpus and Baseline Classification System , 2017, BioNLP.

[5]  Rachel E. Ginn,et al.  Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter , 2016, Drug Safety.

[6]  Mark Dredze,et al.  Could behavioral medicine lead the web data revolution? , 2014, JAMA.

[7]  Syed Abdul Shabbir,et al.  Feature Engineering for Recognizing Adverse Drug Reactions from Twitter Posts , 2016, Inf..

[8]  JITENDRA JONNAGADDALA,et al.  BINARY CLASSIFICATION OF TWITTER POSTS FOR ADVERSE DRUG REACTIONS , 2015 .

[9]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[10]  Jitendra Jonnagaddala,et al.  ZikaHack 2016: A digital disease detection competition , 2017, DDDSM@IJCNLP.

[11]  Jun'ichi Tsujii,et al.  Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[12]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[13]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[14]  Roberto Basili,et al.  Structured Lexical Similarity via Convolution Kernels on Dependency Trees , 2011, EMNLP.

[15]  Roberto Basili,et al.  Effective use of WordNet Semantics via Kernel-Based Learning , 2005, CoNLL.

[16]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[17]  Mark Dredze,et al.  Exploring Health Topics in Chinese Social Media: An Analysis of Sina Weibo , 2014, AAAI 2014.

[18]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[19]  Abeed Sarker,et al.  Social media mining for identification and exploration of health-related information from pregnant women , 2017, ArXiv.

[20]  J. Brownstein,et al.  Digital disease detection--harnessing the Web for public health surveillance. , 2009, The New England journal of medicine.

[21]  Mark Dredze,et al.  How Social Media Will Change Public Health , 2012, IEEE Intelligent Systems.

[22]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[23]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[24]  Roberto Basili,et al.  KeLP: a Kernel-based Learning Platform for Natural Language Processing , 2015, ACL.

[25]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[26]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[27]  Yusuke Miyao,et al.  TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations , 2017, JMIR public health and surveillance.

[28]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .