MINING OF TEXTUAL DATABASES WITHIN THE PRODUCT DEVELOPMENT PROCESS

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers.

[1]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[2]  Robert V. Brill,et al.  Applied Statistics and Probability for Engineers , 2004, Technometrics.

[3]  Aarnout Brombacher,et al.  The needs and benefits of applying textual data mining within the product development process , 2004 .

[4]  Jörg Kindermann,et al.  Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? , 2002, Machine Learning.

[5]  Chih-Jen Lin,et al.  A Simple Decomposition Method for Support Vector Machines , 2002, Machine Learning.

[6]  Ning Zhong,et al.  Using Rough Sets with Heuristics for Feature Selection , 1999, Journal of Intelligent Information Systems.

[7]  Mohammed Benkhalifa,et al.  Integrating External Knowledge to Supplement Training Data in Semi-Supervised Learning for Text Categorization , 2004, Information Retrieval.

[8]  Nello Cristianini,et al.  Latent Semantic Kernels , 2001, Journal of Intelligent Information Systems.

[9]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[10]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[11]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[12]  Ji-Rong Wen,et al.  Text Classification Using Stochastic Keyword Generation , 2003, ICML.

[13]  Darren R Flower Databases and data mining for computational vaccinology. , 2003, Current opinion in drug discovery & development.

[14]  Ran El-Yaniv,et al.  Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..

[15]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[18]  Cornelis H. A. Koster,et al.  Uncertainty and Term Selection in Text Categorization , 2003, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[19]  Mu-Chen Chen,et al.  Configuration of cellular manufacturing systems using association rule induction , 2003 .

[20]  E. L. Nichols,et al.  Data Mining for Business Process Reengineering , 2003 .

[21]  Ming Zhang,et al.  Relative term-frequency based feature selection for text categorization , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[22]  Yan-Yong Xu,et al.  Weak learning algorithm for multi-label multiclass text categorization , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[23]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[24]  Chung-Hsing Yeh,et al.  Multilingual text categorisation for global knowledge discovery using fuzzy techniques , 2002, Proceedings 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS 2002).

[25]  Hang Li,et al.  Mining Open Answers in Questionnaire Data , 2001, IEEE Intell. Syst..

[26]  Judy Kay,et al.  A Comparative Study on Statistical Machine Learning Algorithms and Thresholding Strategies for Automatic Text Categorization , 2002, PRICAI.

[27]  Tong Zhang,et al.  Experiments in high-dimensional text categorization , 2002, SIGIR '02.

[28]  Victor A. Skormin,et al.  Data mining technology for failure prognostic of avionics , 2002 .

[29]  Armin Shmilovici,et al.  Data mining for improving a cleaning process in the semiconductor industry , 2002 .

[30]  Vipin Kumar,et al.  Emerging scientific applications in data mining , 2002, CACM.

[31]  Hirokazu Taki,et al.  Applying data mining to a field quality watchdog task , 2002 .

[32]  Yuan-Fang Wang,et al.  The use of bigrams to enhance text categorization , 2002, Inf. Process. Manag..

[33]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[35]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[36]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[37]  Knowledge Discovery in Steel Industry Measurements , 2002 .

[38]  M. D. Giess,et al.  Informing Design Using Data Mining Methods , 2002, DAC 2002.

[39]  Rajesh Jugulum,et al.  The Mahalanobis-Taguchi strategy : a pattern technology system , 2002 .

[40]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[41]  Constantin F. Aliferis,et al.  LARGE-SCALE FEATURE SELECTION USING MARKOV BLANKET INDUCTION FOR THE PREDICTION OF PROTEIN-DRUG BINDING , 2002 .

[42]  Abraham Kandel,et al.  Data mining for process and quality control in the semiconductor industry , 2001 .

[43]  Dan Braha Data mining for design and manufacturing: methods and applications , 2001 .

[44]  Rakesh Nagi,et al.  A data mining-based engineering design support system: a research agenda , 2001 .

[45]  Yan Jin,et al.  Data mining for knowledge acquisition in engineering design , 2001 .

[46]  Ralf-Stefan Lossack,et al.  Automatic classification and creation of classificaton systems using methodologies of knowledge discovery in databases (KDD) , 2001 .

[47]  Stephan Rudolph,et al.  Data mining in scientific data , 2001 .

[48]  Sang-Chan Park,et al.  Data mining for high quality and quick response manufacturing , 2001 .

[49]  Li Pheng Khoo,et al.  A Radial Basis Function Neural Network Multicultural Factors Evaluation Engine For Product Concept Development , 2001, Expert Syst. J. Knowl. Eng..

[50]  Haym Hirsh,et al.  Using LSI for text classification in the presence of background text , 2001, CIKM '01.

[51]  Jugal K. Kalita,et al.  Summarization as feature selection for text categorization , 2001, CIKM '01.

[52]  Siu Cheung Hui,et al.  An intelligent online machine fault diagnosis system , 2001 .

[53]  Tetsuya Nasukawa,et al.  Text analysis and knowledge mining system , 2001, IBM Syst. J..

[54]  Fabio Casati,et al.  Improving Business Process Quality through Exception Understanding, Prediction, and Prevention , 2001, VLDB.

[55]  Yuko Teranishi,et al.  Development of Automated Data Mining System for Quality Control in Manufacturing , 2001, DaWaK.

[56]  Jung-Hyun Lee,et al.  Feature Selection Using Association Word Mining for Classification , 2001, DEXA.

[57]  Fabrizio Sebastiani,et al.  Report on the Workshop on Operational Text Classification systems (OTC-01) , 2001, SIGF.

[58]  F. Feldbusch,et al.  A heuristic for feature selection for the classification with neural nets , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[59]  Peter C. Nelson,et al.  An Intelligent Data Mining System for Drop Test Analysis of Electronic Products Manufacturing , 2001 .

[60]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[61]  Stan Matwin,et al.  A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization , 2001 .

[62]  Clifford Behrens,et al.  Telcordia LSI Engine: implementation and scalability issues , 2001, Proceedings Eleventh International Workshop on Research Issues in Data Engineering. Document Management for Data Intensive Business and Scientific Applications. RIDE 2001.

[63]  Cihan H. Dagli,et al.  Engineering Smart Data Mining Systems for Internet Aided Design and Manufacturing , 2001 .

[64]  Shuang Song,et al.  IDENTIFYING SHARED UNDERSTANDING IN DESIGN USING DOCUMENT ANALYSIS , 2001 .

[65]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[66]  Hidetaka Tsuda,et al.  Yield analysis and improvement by reducing manufacturing fluctuation noise , 2000, Proceedings of ISSM2000. Ninth International Symposium on Semiconductor Manufacturing (IEEE Cat. No.00CH37130).

[67]  Aarnout Brombacher,et al.  The building bricks of product quality: An overview of some basic concepts and principles , 2000 .

[68]  Jack Bieker,et al.  Data mining solves tough semiconductor manufacturing problems , 2000, KDD '00.

[69]  Thomas G. Dietterich,et al.  Mining IC test data to optimize VLSI testing , 2000, KDD '00.

[70]  Robert P. Goldman,et al.  Textual data mining of service center call records , 2000, KDD '00.

[71]  Alex H. B. Duffy,et al.  Knowledge Discovery and Data Mining within a Design Environment , 2000, Knowledge Intensive CAD.

[72]  E. Ziegel Forecasting and Time Series: An Applied Approach , 2000 .

[73]  I C G Campbell,et al.  Radial Basis Function Networks: Design and Applications , 2000 .

[74]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[75]  Richard Kittler Advanced Statistical Tools for Improving Yield and Reliability , 2000 .

[76]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[77]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[78]  Bob Carpenter,et al.  Vector-based Natural Language Call Routing , 1999, Comput. Linguistics.

[79]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[80]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[81]  Stan Matwin,et al.  Feature Engineering for Text Classification , 1999, ICML.

[82]  Chris J McDonald,et al.  New tools for yield improvement in integrated circuit manufacturing: can they be applied to reliability? , 1999 .

[83]  Karim K. Hirji,et al.  Discovering data mining: from concept to implementation , 1999, SKDD.

[84]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[85]  F. Mieno,et al.  Yield improvement using data mining system , 1999, 1999 IEEE International Symposium on Semiconductor Manufacturing Conference Proceedings (Cat No.99CH36314).

[86]  F. Miralles BPR based on Data Mining Tools: Redesigning the Sales Promotion Process in Retailing , 1999 .

[87]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[88]  Günter Neumann,et al.  Combining Shallow Text Processing and Machine Learning in Real World Applications , 1999 .

[89]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[90]  Christina Mastrangelo,et al.  Data mining in a chemical process application , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[91]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[92]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[93]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[94]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[95]  Haym Hirsh,et al.  Learning to set up numerical optimizations of engineering designs , 1998, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[96]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[97]  Xuehong Du,et al.  Design by Customers for Mass Customization Products , 1998 .

[98]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[99]  Wm William Goble The use and development of quantitative reliability and safety analysis in new product design , 1998 .

[100]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[101]  Maria C. Yang,et al.  Data Mining for Thesaurus Generation in Informal Design Information Retrieval , 1998 .

[102]  Maria C. Yang,et al.  DESIGN INFORMATION RETRIEVAL: IMPROVING ACCESS TO THE INFORMAL SIDE OF DESIGN , 1998 .

[103]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[104]  Steven H. Kim,et al.  Nonlinear prediction of manufacturing systems through explicit and implicit data mining , 1997 .

[105]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[106]  Rolf Stadler,et al.  Discovering Data Mining: From Concept to Implementation , 1997 .

[107]  Usama M. Fayyad,et al.  Knowledge Discovery in Databases: An Overview , 1997, ILP.

[108]  Michael W. Berry,et al.  Large-Scale Information Retrieval with Latent Semantic Indexing , 1997, Inf. Sci..

[109]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[110]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[111]  Alice M. Agogino,et al.  Text analysis for constructing design representations , 1997, Artif. Intell. Eng..

[112]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[113]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[114]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[115]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[116]  Yiming Yang,et al.  Using Corpus Statistics to Remove Redundant Words in Text Categorization , 1996, J. Am. Soc. Inf. Sci..

[117]  Usama M. Fayyad,et al.  Data Mining and Knowledge Discovery: Making Sense Out of Data , 1996, IEEE Expert.

[118]  R. de Graaf,et al.  Assessing product development : visualizing process and technology performance with RACE , 1996 .

[119]  Jan O. Pedersen,et al.  Document Routing as Statistical Classification , 1996 .

[120]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[121]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[122]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[123]  T. R. Bement,et al.  Taguchi techniques for quality engineering , 1995 .

[124]  Yiming Yang,et al.  Noise reduction in a statistical approach to text categorization , 1995, SIGIR '95.

[125]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[126]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[127]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[128]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[129]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[130]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[131]  Paul S. Jacobs,et al.  Joining Statistics with NLP for Text Categorization , 1992, ANLP.

[132]  W. John Wilbur,et al.  The automatic identification of stop words , 1992, J. Inf. Sci..

[133]  Susan T. Dumais,et al.  LSI meets TREC: A Status Report , 1992, TREC.

[134]  David D. Lewis,et al.  Evaluating Text Categorization I , 1991, HLT.

[135]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[136]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[137]  Susan T. Dumais,et al.  Enhancing Performance in Latent Semantic Indexing (LSI) Retrieval , 1990 .

[138]  P. W. Foltz,et al.  Using latent semantic indexing for information filtering , 1990, COCS '90.

[139]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[140]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[141]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[142]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[143]  Sidney Addelman,et al.  trans-Dimethanolbis(1,1,1-trifluoro-5,5-dimethylhexane-2,4-dionato)zinc(II) , 2008, Acta crystallographica. Section E, Structure reports online.

[144]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[145]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.