Mining User-Generated Repair Instructions from Automotive Web Communities

The objective of this research was to automatically extract user-generated repair instructions from large amounts of web data. An artifact has been created that classifies a web post as containing a repair instruction or not. Methods from Natural Language Processing are used to transform the unstructured textual information from a web post into a set of numerical features that can be further processed by different Machine Learning Algorithms. The main contribution of this research lies in the design and prototypical implementation of these features. The evaluation shows that the created artifact can accurately distinguish posts containing repair instructions from other posts e.g. containing problem reports. With such a solution, a company can save a lot of time and money that was previously necessary to perform this classification task manually.

[1]  Namita Mittal,et al.  Prominent Feature Extraction for Sentiment Analysis , 2015, Socio-Affective Computing.

[2]  Alpheus Bingham,et al.  Improving analytics capabilities through crowdsourcing , 2016 .

[3]  Meng Zhang,et al.  Neural Network Methods for Natural Language Processing , 2017, Computational Linguistics.

[4]  Patrick Saint-Dizier,et al.  Investigating the Structure of Procedural Texts for Answering How-to Questions , 2008, LREC.

[5]  Valentin Jijkoun,et al.  Mining User Experiences from Online Forums: An Exploration , 2010, HLT-NAACL 2010.

[6]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[7]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[8]  Ziqi Zhang,et al.  Automatically Extracting Procedural Knowledge from Instructional Texts using Natural Language Processing , 2012, LREC.

[9]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[10]  Shinsuke Mori,et al.  A Framework for Procedural Text Understanding , 2015, IWPT.

[11]  John Krumm,et al.  User-Generated Content , 2008, IEEE Pervasive Comput..

[12]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[13]  Yoko Yamakata,et al.  Feature Extraction and Summarization of Recipes Using Flow Graph , 2013, SocInfo.

[14]  Jean-Valère Cossu,et al.  A review of features for the discrimination of twitter users: application to the prediction of offline influence , 2015, Social Network Analysis and Mining.

[15]  Ralph Bergmann,et al.  Extraction of procedural knowledge from the web: a comparison of two workflow extraction approaches , 2012, WWW.

[16]  Sunita Sarawagi Inter-class relationships in text classification , 2006 .

[17]  Ling Yin,et al.  Adapting the Naive Bayes Classifier to Rank Procedural Texts , 2006, ECIR.

[18]  Thierson Couto,et al.  On Efficient Meta-Level Features for Effective Text Classification , 2014, CIKM.

[19]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[20]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[21]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[22]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[23]  Muhammad Kashif Hanif,et al.  Text Mining: Techniques, Applications and Issues , 2016 .

[24]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[25]  Alan R. Hevner,et al.  Design Science in Information Systems Research , 2004, MIS Q..

[26]  Benno Stein,et al.  Predicting quality flaws in user-generated content: the case of wikipedia , 2012, SIGIR '12.

[27]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[28]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[29]  Manohar Swamynathan,et al.  Mastering Machine Learning with Python in Six Steps: A Practical Implementation Guide to Predictive Data Analytics Using Python , 2017 .

[30]  G. Mishne Experiments with Mood Classification in , 2005 .

[31]  Gilles Louppe,et al.  Independent consultant , 2013 .

[32]  Alan R. Hevner,et al.  POSITIONING AND PRESENTING DESIGN SCIENCE RESEARCH FOR MAXIMUM IMPACT 1 , 2013 .

[33]  Yuji Matsumoto,et al.  Feature Selection in Categorizing Procedural Expressions , 2003 .

[34]  Christina Lioma,et al.  Part of speech n-grams and Information Retrieval , 2008 .

[35]  Martin Bichler,et al.  Design science in information systems research , 2006, Wirtschaftsinf..

[36]  Vijay K. Vaishnavi,et al.  Theory Development in Design Science Research: Anatomy of a Research Project , 2008 .

[37]  Yamakata Yoko,et al.  Flow Graph Corpus from Recipe Texts , 2013 .

[38]  Dan Sullivan,et al.  Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales , 2001 .

[39]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[40]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .