Knowledge Discovery and Management for Product Design through Text Mining - a Case Study of Online Information Integration for Designers

Product innovation and design is often regarded as an information and knowledge intensive activity. While existing tools, i.e. search engines and product data management systems, have provided a certain degree of assistance to design community in information storage, processing and retrieval, many challenges remain, especially if we intend to provide advanced capabilities so that the designers are able to handle the overloaded information. These capabilities include, but are not restricted to, automated classification of design relevant documents based on product architecture, relevance analysis and summarization of design documents, identification of product function from customer feedback\reviews in conceptual design, personal design information subscription and management, market analysis and competitive intelligence report generation and so on. To fulfill these demands, an integrated text mining system for knowledge discovery and management in product design is proposed. In this paper, we report our study on the integration of online information with the internal knowledge base, e.g. product taxonomy, as part of our efforts towards the realization of the proposed system. Several key techniques are explained, i.e. maximal frequent word sequence algorithm in discovering the quality phrases, document profile model based on salient semantic information and finally concept based automated text classification approach. Experimental studies have demonstrated its effectiveness.

[1]  Mika Klemettinen,et al.  Finding Co-occurring Text Phrases by Combining Sequence and Frequent Set Discovery , 2007 .

[2]  Martin Rajman,et al.  Text Mining: Natural Language techniques and Text Mining applications , 1998 .

[3]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[4]  Dan Braha,et al.  Data Mining for Design and Manufacturing , 2001, Massive Computing.

[5]  W. Kim,et al.  Fair process: managing in the knowledge economy. , 1997, Harvard business review.

[6]  P. Drucker The Age of Discontinuity: Guidelines to Our Changing Society , 1969 .

[7]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[8]  Petra Badke-Schaub,et al.  Analysis of design projects , 1999 .

[9]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[10]  Steve Culley,et al.  Knowledge management in engineering design: personalization and codification , 2004 .

[11]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[12]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[13]  Chris A. McMahon,et al.  Characterising the requirements of engineering information systems , 2004, Int. J. Inf. Manag..

[14]  Han Tong Loh,et al.  Deriving Taxonomy from Documents at Sentence Level , 2008 .

[15]  Victor Zue Navigating the Information Superhighway Using Spoken Language Interfaces , 1995, IEEE Expert.

[16]  Km Wallace,et al.  Design Management: Process and Information Issues , 2001 .

[17]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[18]  Marti A. Hearst Text Data Mining , 2005 .

[19]  Ari Visa,et al.  Technology of Text Mining , 2001, MLDM.

[20]  Mark J. Dixon An Overview of Document Mining Technology , 1997 .

[21]  Thomas Hofmann,et al.  Text categorization by boosting automatically extracted concepts , 2003, SIGIR.

[22]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[23]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[24]  M. Malone,et al.  Intellectual Capital: Realizing Your Company's True Value by Finding Its Hidden Brainpower , 1997 .

[25]  Steve Culley,et al.  A method for the study of information use profiles for design engineers , 1999 .

[26]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[27]  Helena Ahonen-Myka Finding All Maximal Frequent Sequences in Text , 1999 .

[28]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[29]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[30]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[31]  T. A. Stewart Intellectual Capital: The New Wealth of Organizations , 1997 .

[32]  Ying Liu A concept-based text classification system for manufacturing information retrieval , 2005 .

[33]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[34]  José Palazzo Moreira de Oliveira,et al.  Concept-based knowledge discovery in texts extracted from the Web , 2000, SKDD.

[35]  Han Tong Loh,et al.  Topic Detection Using MFSs , 2006, IEA/AIE.

[36]  Chris A McMahon,et al.  Information Access Diagrams: A Technique for Analyzing the Usage of Design Information , 1996 .

[37]  Norbert Fuhr,et al.  A probabilistic model of dictionary based automatic indexing , 1985, RIAO.

[38]  H. T. Loh,et al.  Building a Document Corpus for Manufacturing Knowledge Retrieval , 2004 .

[39]  Petra Badke-Schaub,et al.  Design Representations in Critical Situations of Product Development , 2004 .

[40]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[41]  Clarence A. Ellis,et al.  Groupware: some issues and experiences , 1991, CACM.

[42]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[43]  Andrew Kusiak,et al.  Data Mining in Manufacturing: A Review , 2006 .

[44]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[45]  Ying Liu,et al.  A FRAMEWORK OF INFORMATION AND KNOWLEDGE MANAGEMENT FOR PRODUCT DESIGN AND DEVELOPMENT - A TEXT MINING APPROACH , 2006 .

[46]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[47]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[48]  C. Apte,et al.  Data mining with decision trees and decision rules , 1997, Future Gener. Comput. Syst..

[49]  D. Tony Liu,et al.  A review of web-based product data management systems , 2001 .

[50]  S. Robertson The probability ranking principle in IR , 1997 .

[51]  Shi Bing,et al.  Inductive learning algorithms and representations for text categorization , 2006 .

[52]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[53]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[54]  Rickard Cöster,et al.  Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization , 2004, COLING.

[55]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[56]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.