Automated extraction of product comparison matrices from informal product descriptions

We propose a procedure for extracting comparison matrices from informal product descriptions.We evaluate our proposal against numerous categories of products mined from BestBuy.Matrices exhibit numerous comparable information and can supplement or even refine technical descriptions.A user study shows that our automated approach retrieves a significant portion of correct information.Users can compute, control, edit and refine matrices in a Web environment called MatrixMiner. Domain analysts, product managers, or customers aim to capture the important features and differences among a set of related products. A case-by-case reviewing of each product description is a laborious and time-consuming task that fails to deliver a condense view of a family of product.In this article, we investigate the use of automated techniques for synthesizing a product comparison matrix (PCM) from a set of product descriptions written in natural language. We describe a tool-supported process, based on term recognition, information extraction, clustering, and similarities, capable of identifying and organizing features and values in a PCM - despite the informality and absence of structure in the textual descriptions of products.We evaluate our proposal against numerous categories of products mined from BestBuy. Our empirical results show that the synthesized PCMs exhibit numerous quantitative, comparable information that can potentially complement or even refine technical descriptions of products. The user study shows that our automatic approach is capable of extracting a significant portion of correct features and correct values. This approach has been implemented in MatrixMiner a web environment with an interactive support for automatically synthesizing PCMs from informal product descriptions. MatrixMiner also maintains traceability with the original descriptions and the technical specifications for further refinement or maintenance by users.

[1]  Mathieu Acher,et al.  MatrixMiner: a red pill to architect informal product descriptions in the matrix , 2015, ESEC/SIGSOFT FSE.

[2]  Maria Teresa Pazienza,et al.  Modelling syntactic context in automatic term extractionRoberto , 2010 .

[3]  Li Yi,et al.  Mining binary constraints in the construction of feature models , 2012, 2012 20th IEEE International Requirements Engineering Conference (RE).

[4]  Christian Kästner,et al.  Variability Mining: Consistent Semi-automatic Detection of Product-Line Features , 2014, IEEE Transactions on Software Engineering.

[5]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[6]  Jane Cleland-Huang,et al.  On-demand feature recommendations derived from mining public product descriptions , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[7]  Arnaud Gotlieb,et al.  Synthesis of attributed feature models from product descriptions , 2015, SPLC.

[8]  Patrick Drouin,et al.  Term extraction using non-technical corpora as a point of leverage , 2003 .

[9]  Felice Dell'Orletta,et al.  Mining commonalities and variabilities from natural language documents , 2013, SPLC '13.

[10]  Rubén Prieto-Díaz,et al.  DARE: Domain analysis and reuse environment , 1998, Ann. Softw. Eng..

[11]  Sergio Segura,et al.  An assessment of search-based techniques for reverse engineering feature models , 2015, J. Syst. Softw..

[12]  Derek Rayside,et al.  Comparison of exact and approximate multi-objective optimization for software product lines , 2014, SPLC.

[13]  Sophia Ananiadou,et al.  The C-value/NC-value domain-independent method for multi-word term extraction , 1999 .

[14]  Yair Wand,et al.  Comparing functionality of software systems: An ontological approach , 2013, Data Knowl. Eng..

[15]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[16]  S. Iyengar The Art of Choosing , 2010 .

[17]  Paul Nation,et al.  Identifying technical vocabulary , 2004 .

[18]  Nan Niu,et al.  Concept analysis for product line requirements , 2009, AOSD '09.

[19]  Mathieu Acher,et al.  From comparison matrix to Variability Model: The Wikipedia case study , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[20]  Krzysztof Czarnecki,et al.  Mining configuration constraints: static analyses and empirical results , 2014, ICSE.

[21]  Nan Niu,et al.  On-Demand Cluster Analysis for Product Line Functional Requirements , 2008, 2008 12th International Software Product Line Conference.

[22]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[23]  Iris Reinhartz-Berger,et al.  SOVA - A Tool for Semantic and Ontological Variability Analysis , 2014, CAiSE.

[24]  Mathieu Acher,et al.  On extracting feature models from product descriptions , 2012, VaMoS.

[25]  Haiyan Zhao,et al.  An approach to constructing feature models based on requirements clustering , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[26]  Iris Reinhartz-Berger,et al.  Can domain modeling be automated?: levels of automation in domain modeling , 2014, SPLC.

[27]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[28]  Gunter Saake,et al.  Feature-Oriented Software Product Lines , 2013, Springer Berlin Heidelberg.

[29]  Dragan Gasevic,et al.  Decision support for the software product line domain engineering lifecycle , 2011, Automated Software Engineering.

[30]  Mathieu Acher,et al.  Breathing ontological knowledge into feature model synthesis: an empirical study , 2015, Empirical Software Engineering.

[31]  Krzysztof Czarnecki,et al.  Efficient synthesis of feature models , 2012, SPLC '12.

[32]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[33]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[34]  Olivier Barais,et al.  Automating the formalization of product comparison matrices , 2014, ASE.

[35]  Mathieu Acher,et al.  Comparing or configuring products: are we getting the right ones? , 2014, VaMoS.

[36]  Nan Niu,et al.  Extracting and Modeling Product Line Functional Requirements , 2008, 2008 16th IEEE International Requirements Engineering Conference.

[37]  Maik Moeller A Practical Course In Terminology Processing , 2016 .

[38]  Roberto Basili,et al.  Modelling the syntactic contextual information for term extraction , 2001 .

[39]  Jane Cleland-Huang,et al.  Supporting Domain Analysis through Mining and Recommending Features from Online Product Listings , 2013, IEEE Transactions on Software Engineering.

[40]  Sophia Ananiadou,et al.  Term extraction using a similarity-based approach , 2001 .

[41]  Felice Dell'Orletta,et al.  Ensemble system for Part-of-Speech tagging , 2009 .

[42]  Mathieu Acher,et al.  Feature model extraction from large collections of informal product descriptions , 2013, ESEC/FSE 2013.

[43]  Christoph Pohl,et al.  An Exploratory Study of Information Retrieval Techniques in Domain Analysis , 2008, 2008 12th International Software Product Line Conference.

[44]  Roberto Basili,et al.  A Contrastive Approach to Term Extraction , 2001 .

[45]  Isabel John,et al.  Capturing Product Line Information from Legacy User Documentation , 2006, Software Product Lines.

[46]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[47]  Kyo Chul Kang,et al.  Feature-Oriented Domain Analysis (FODA) Feasibility Study , 1990 .

[48]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[49]  Ruzanna Chitchyan,et al.  A framework for constructing semantically composable feature models from natural language requirements , 2009, SPLC.

[50]  Simonetta Montemagni,et al.  A Contrastive Approach to Multi-word Extraction from Domain-specific Corpora , 2010, LREC.

[51]  Zarinah Mohd Kasirun,et al.  Feature extraction approaches from natural language requirements for reuse in software product lines: A systematic literature review , 2015, J. Syst. Softw..