Exploring Automated Text Classification to Improve Keyword Corpus Search Results for Bioinspired Design

Bioinspired design is the adaptation of methods, strategies, or principles found in nature to solve engineering problems. One formalized approach to bioinspired solution seeking is the abstraction of the engineering problem into a functional need and then seeking solutions to this function using a keyword type search method on text based biological knowledge. These function keyword search approaches have shown potential for success, but as with many text based search methods, they produce a large number of results, many of little relevance to the problem in question. In this paper, we develop a method to train a computer to identify text passages more likely to suggest a solution to a human designer. The work presented examines the possibility of filtering biological keyword search results by using text mining algorithms to automatically identify which results are likely to be useful to a designer. The text mining algorithms are trained on a pair of surveys administered to human subjects to empirically identify a large number of sentences that are, or are not, helpful for idea generation. We develop and evaluate three text classification algorithms, namely, a Naive Bayes (NB) classifier, a k nearest neighbors (kNN) classifier, and a support vector machine (SVM) classifier. Of these methods, the NB classifier generally had the best performance. Based on the analysis of 60 word stems, a NB classifier's precision is 0.87, recall is 0.52, and F score is 0.65. We find that word stem features that describe a physical action or process are correlated with helpful sentences. Similarly, we find biological jargon feature words are correlated with unhelpful sentences.

[1]  Ashok K. Goel,et al.  DANE: Fostering Creativity in and through Biologically Inspired Design , 2011 .

[2]  Robert L. Nagel,et al.  Exploring the Use of Functional Models in Biomimetic Conceptual Design , 2008 .

[3]  Ashok K. Goel,et al.  Structure, behavior, and function of complex systems: The structure, behavior, and function modeling language , 2008, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[4]  Wanda Pratt,et al.  Better rules, fewer features: a semantic approach to selecting features from text , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5]  L. H. Shu,et al.  TOWARDS BIOMIMETIC CONCEPT GENERATION , 2001 .

[6]  Daniel A. McAdams,et al.  Conceptualization of biomimetic sensors through functional representation of natural sensing solutions , 2009 .

[7]  Michael W. Berry,et al.  Text mining : applications and theory , 2010 .

[8]  Chris D. Paice,et al.  Another stemmer , 1990, SIGF.

[9]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[10]  Robert Stone,et al.  TRANSLATING TERMS OF THE FUNCTIONAL BASIS INTO BIOLOGICALLY MEANINGFUL KEYWORDS , 2008 .

[11]  L. H. Shu,et al.  Biomimetic design through natural language analysis to facilitate cross-domain information retrieval , 2007, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Amaresh Chakrabarti,et al.  Sapphire – an Approach to Analysis and Synthesis , 2009 .

[14]  J. Vincent,et al.  Biomimetics: its practice and theory , 2006, Journal of The Royal Society Interface.

[15]  Robert L. Nagel,et al.  EXPLORING THE USE OF FUNCTIONAL MODELS AS A FOUNDATION FOR BIOMIMETIC CONCEPTUAL DESIGN , 2007 .

[16]  Kailash C. Kapur,et al.  Customer driven reliability: integration of QFD and robust design , 1997, Annual Reliability and Maintainability Symposium.

[17]  L. H. Shu,et al.  Bridging Cross-Domain Terminology for Biomimetic Design , 2005 .

[18]  Arthur B. Markman,et al.  An Experimental Study of Group Idea Generation Techniques: Understanding the Roles of Idea Representation and Viewing Methods , 2011 .

[19]  Ashok K. Goel,et al.  Seeking bioinspiration online: A descriptive account , 2013 .

[20]  L. H. Shu,et al.  A natural-language approach to biomimetic design , 2010, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[21]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[22]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[23]  L. H. Shu,et al.  Supporting Biomimetic Design by Embedding Metadata in Natural-Language Corpora , 2010 .

[24]  Jacquelyn K. S. Nagel,et al.  An Engineering-to-Biology Thesaurus for Engineering Design , 2010 .

[25]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[26]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[27]  Julie S. Linsey,et al.  An Experimental Investigation of Analogy Formation Using the Engineering-to-Biology Thesaurus , 2013 .

[28]  Julie S. Linsey,et al.  Evaluating the Directed Method for Bioinspired Design , 2012 .

[29]  Ashok K. Goel,et al.  Foraging for Inspiration: Understanding and Supporting the Online Information Seeking Practices of Biologically Inspired Designers , 2011 .

[30]  Haym Hirsh,et al.  Mining Associations in Text in the Presence of Background Knowledge , 1996, KDD.

[31]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[32]  Ashok K. Goel,et al.  Innovation in Analogical Design: A Model-Based Approach , 1994 .

[33]  Kristin L. Wood,et al.  A Quantitative Similarity Metric for Design-by-Analogy , 2002 .

[34]  Joost Duflou,et al.  A scalable approach for the integration of large knowledge repositories in the Biologically-Inspired Design process , 2011 .

[35]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[36]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[37]  Joost Duflou,et al.  Automatically Populating the Biomimicry Taxonomy for Scalable Systematic Biologically-Inspired Design , 2012 .

[38]  Ashok K. Goel Design, Analogy, and Creativity , 1997, IEEE Expert.

[39]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[40]  Marko Grobelnik,et al.  Interaction of Feature Selection Methods and Linear Classification Models , 2002 .

[41]  Daniel A. McAdams,et al.  Biologically Meaningful Keywords for Functional Terms of the Functional Basis , 2011 .

[42]  Ashok K. Goel,et al.  Learning Generic Mechanisms for Innovative Strategies in Adaptive Design , 1997 .

[43]  Carolyn Conner Seepersad,et al.  An experimental investigation of the innovation capabilities of engineering students , 2010 .

[44]  L. H. Shu,et al.  Natural Language Analysis for Biomimetic Design , 2004 .