Automated Discovery of Product Feature Inferences Within Large-Scale Implicit Social Media Data

Recently, social media has emerged as an alternative, viabl e source to extract large-scale, heterogeneous product feat ur s in a time and cost efficient manner. One of the challenges of utilizing social media data to inform product design decisi ons is the existence of implicit data such as sarcasm, which accounts for 22.75% of social media data, and can potentially create bias in the predictive models that learn from such dat a sources. For example, if a customer says “I just love waiting all day while this song downloads”, an automated product feature extraction model may incorrectly associate a posit ive sentiment of “love” to the cell phone’s ability to download. While traditional text mining techniques are designed to ha ndle well-formed text where product features are explicitly inferred from the combination of words, these tools would fail to process these social messages that include implicit prod uct feature information. In this paper, we propose a method that enables designers to utilize implicit social media data by t ranslating each implicit message into its equivalent explicit f orm, using the word concurrence network. A case study of Twitter messages that discuss smartphone features is used to valida te the proposed method. The results from the experiment not onl y show that the proposed method improves the interpretabilit y of implicit messages, but also sheds light on potential appl ications in the design domains where this work could be extended.

[1]  Conrad S. Tucker,et al.  Automated Discovery of Lead Users and Latent Product Features by Mining Large Scale Social Media Networks , 2015 .

[2]  Harry Shum,et al.  Twitter Topic Summarization by Ranking Tweets using Social Influence and Content Quality , 2012, COLING.

[3]  Srividya Ramaswamy,et al.  Comparing the Efficiency of Two Clustering Techniques , 2010 .

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Diana Maynard,et al.  Who cares about Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analysis. , 2014, LREC.

[6]  D. Muecke Irony and the Ironic , 1970 .

[7]  Lipika Dey,et al.  Studying the effects of noisy text on text mining applications , 2009, AND '09.

[8]  Fei Liu,et al.  Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion , 2008, 2008 IEEE Spoken Language Technology Workshop.

[9]  P. Gloor,et al.  Predicting Asset Value through Twitter Buzz , 2012 .

[10]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[11]  Paola Velardi,et al.  Structural semantic interconnections: a knowledge-based approach to word sense disambiguation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Conrad S. Tucker,et al.  Automated discovery of product preferences in ubiquitous social media data: A case study of automobile market , 2016, 2016 International Computer Science and Engineering Conference (ICSEC).

[13]  Ari Rappoport,et al.  ICWSM - A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews , 2010, ICWSM.

[14]  Christophe G. Giraud-Carrier,et al.  Identifying Health-Related Topics on Twitter - An Exploration of Tobacco-Related Tweets as a Test Topic , 2011, SBP.

[15]  Mike Thelwall,et al.  Sentiment in Twitter events , 2011, J. Assoc. Inf. Sci. Technol..

[16]  Marcel Salathé,et al.  Modeling Individual-Level Infection Dynamics Using Social Network Information , 2015, CIKM.

[17]  Conrad S. Tucker,et al.  A Product Feature Inference Model for Mining Implicit Customer Preferences Within Large Scale Social Media Networks , 2015 .

[18]  Jarernsri L. Mitrpanont,et al.  Automatic Discovery of Abusive Thai Language Usages in Social Networks , 2017, ICADL.

[19]  Asta Bäck,et al.  Social Media Roadmaps: Exploring the futures triggered by social media , 2008 .

[20]  Conrad S. Tucker,et al.  Discovering Next Generation Product Innovations by Identifying Lead User Preferences Expressed Through Large Scale Social Media Data , 2014 .

[21]  Qiang Yang,et al.  Transferring topical knowledge from auxiliary long texts for short text clustering , 2011, CIKM '11.

[22]  William R. Hersh,et al.  Mapping Vocabularies Using Latent Semantics , 1998 .

[23]  Herbert L. Colston,et al.  Irony in Language and Thought : A Cognitive Science Reader , 2007 .

[24]  Jeffery S. Horsburgh,et al.  ONEMercury: Towards Automatic Annotation of Environmental Science Metadata , 2012, LISC@ISWC.

[25]  Marcel Salathé,et al.  An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages , 2014, J. Biomed. Informatics.

[26]  John Yen,et al.  Classifying text messages for the haiti earthquake , 2011, ISCRAM.

[27]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[28]  Haluk Bingol,et al.  CO-OCCURRENCE NETWORK OF REUTERS NEWS , 2007 .

[29]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[30]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[31]  Nina Wacholder,et al.  Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[32]  C. Lee Giles,et al.  How are you feeling?: A personalized methodology for predicting mental states from temporally observable physical and behavioral information , 2017, J. Biomed. Informatics.

[33]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[34]  C. Lee Giles,et al.  Automatic Detection of Pseudocodes in Scholarly Documents Using Machine Learning , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[35]  C. Lee Giles,et al.  Automatic tag recommendation for metadata annotation using probabilistic topic modeling , 2013, JCDL '13.

[36]  Kevin W. Boyack,et al.  OpenOrd: an open-source toolbox for large graph layout , 2011, Electronic Imaging.

[37]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Conrad S. Tucker Fad or Here to Stay: Predicting Product Market Adoption and Longevity Using Large Scale, Social Media Data DETC2013-12661 , 2013 .

[39]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[40]  Seung-Kyum Choi,et al.  Visualization Tool for Interpreting User Needs From User-Generated Content via Text Mining and Classification , 2014, DAC 2014.

[41]  Sung Hoon Lim,et al.  An unsupervised machine learning model for discovering latent infectious diseases using social media data , 2017, J. Biomed. Informatics.

[42]  Conrad S. Tucker,et al.  Quantifying Product Favorability and Extracting Notable Product Features Using Large Scale Social Media Data , 2015, J. Comput. Inf. Sci. Eng..

[43]  Arvid Kappas,et al.  Sentiment in short strength detection informal text , 2010, J. Assoc. Inf. Sci. Technol..

[44]  Ari Rappoport,et al.  Semi-Supervised Recognition of Sarcasm in Twitter and Amazon , 2010, CoNLL.

[45]  C. Lee Giles,et al.  A generalized topic modeling approach for automatic document annotation , 2015, International Journal on Digital Libraries.

[46]  C. Lee Giles,et al.  Improving algorithm search using the algorithm co-citation network , 2012, JCDL '12.

[47]  Heng Ji,et al.  Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media , 2013, ACL.

[48]  E. Fox Emotion Science: Cognitive and Neuroscientific Approaches to Understanding Human Emotions , 2008 .

[49]  R. Gibbs On the psycholinguistics of sarcasm. , 1986 .

[50]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[51]  Berkant Barla Cambazoglu,et al.  A large-scale sentiment analysis for Yahoo! answers , 2012, WSDM '12.

[52]  Conrad S. Tucker,et al.  A Bayesian Sampling Method for Product Feature Extraction From Large-Scale Textual Data , 2016 .

[53]  Paolo Rosso,et al.  A multidimensional approach for detecting irony in Twitter , 2013, Lang. Resour. Evaluation.

[54]  Aristides Gionis,et al.  Answers, not links: extracting tips from yahoo! answers to address how-to web queries , 2012, WSDM '12.

[55]  Marcel Salathé,et al.  Discovering health-related knowledge in social media using ensembles of heterogeneous features , 2013, CIKM.

[56]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[57]  Jun Liu,et al.  An Improved Information Filtering Technology , 2012 .