Towards Effective Extraction and Linking of Software Mentions from User-Generated Support Tickets

Software support tickets contain short and noisy text from the customers. Software products are often represented by various surface forms and informal abbreviations. Automatically identifying software mentions from support tickets and determining the official names and versions are helpful for many downstream applications, \eg routing the support tickets to the right expert groups for support. In this work, we study the problem ofsoftware product name extraction andlinking from support tickets. We first annotate and analyze sampled tickets to understand the language patterns. Next, we design features using local, contextual, and external information sources, for extraction and linking models. In experiments, we show that linear models with the proposed features are able to deliver better and more consistent results, compared with the state-of-the-art baseline models, even on dataset with sparse labels.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Qing Wang,et al.  Constructing the Knowledge Base for Cognitive IT Service Management , 2017, 2017 IEEE International Conference on Services Computing (SCC).

[3]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[4]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[5]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Beth Sundheim,et al.  MUC-5 Evaluation Metrics , 1993, MUC.

[8]  Zhenchang Xing,et al.  Unsupervised Software-Specific Morphological Forms Inference from Informal Discussions , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[9]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[10]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[11]  Jie Tang,et al.  Accurate Product Name Recognition from User Generated Content , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[12]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[13]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[14]  Karthikeyan Ponnalagu Ontology-driven root-cause analytics for user-reported symptoms in managed IT systems , 2017, IBM J. Res. Dev..

[15]  Navendu Jain,et al.  Juggling the Jigsaw: Towards Automated Problem Inference from Network Trouble Tickets , 2013, NSDI.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Jian Ni,et al.  A statistical machine learning approach for ticket mining in IT service delivery , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[18]  Navendu Jain,et al.  ConfSeer: Leveraging Customer Support Knowledge Bases for Automated Misconfiguration Detection , 2015, Proc. VLDB Endow..

[19]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Gabor Melli,et al.  An Overview of the CPROD1 Contest on Consumer Product Recognition within User Generated Postings and Normalization against a Large Product Catalog , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[22]  Giriprasad Sridhara,et al.  ReAct: A System for Recommending Actions for Rapid Resolution of IT Service Incidents , 2016, 2016 IEEE International Conference on Services Computing (SCC).

[23]  Lei Zou,et al.  Efficiently Answering Technical Questions - A Knowledge Graph Approach , 2017, AAAI.

[24]  Jing Li,et al.  Learning to Extract API Mentions from Informal Natural Language Discussions , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[25]  Jianglei Han,et al.  Vertical Domain Text Classification: Towards Understanding IT Tickets Using Deep Neural Networks , 2018, AAAI.

[26]  Gargi Dasgupta,et al.  Automatic problem extraction and analysis from unstructured text in IT tickets , 2017, IBM J. Res. Dev..

[27]  Wanxiang Che,et al.  Revisiting Embedding Features for Simple Semi-supervised Learning , 2014, EMNLP.

[28]  Larisa Shwartz,et al.  Recommending ticket resolution using feature adaptation , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[29]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[30]  Jing Li,et al.  Software-Specific Named Entity Recognition in Software Engineering Social Content , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[31]  Junling Hu,et al.  Bootstrapped Named Entity Recognition for Product Attribute Extraction , 2011, EMNLP.

[32]  Aixin Sun,et al.  Mobile phone name extraction from internet forums: a semi-supervised approach , 2016, World Wide Web.

[33]  Pável Calado,et al.  Towards the Effective Linking of Social Media Contents to Products in E-Commerce Catalogs , 2016, CIKM.

[34]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[35]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[36]  Ea-Ee Jan,et al.  Probabilistic text analytics framework for information technology service desk tickets , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[37]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[38]  Ying Li,et al.  Incident Ticket Analytics for IT Application Management Services , 2014, 2014 IEEE International Conference on Services Computing.

[39]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[40]  Hinrich Schütze,et al.  Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition , 2011, ACL.

[41]  Qianqian Wang,et al.  Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers , 2015, J. Informetrics.

[42]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[43]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[44]  David Nadeau,et al.  Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision , 2007 .

[45]  Divyesh Jadav,et al.  Advanced search system for IT support services , 2017, IBM J. Res. Dev..