A Text Mining Framework for Discovering Technological Intelligence to Support Science and Technology Management

Science and Technology (S&T) information presents a rich resource, essential for managing research and development (R&D) programs. Management of R&D has long been a labor-intensive process, relying extensively on the accumulated knowledge of experts within the organization. Furthermore, the rapid pace of S&T growth has increased the complexity of R&D management significantly. Fortunately, the parallel growth of information and of analytical tools offers the promise of advanced decision aids to support R&D management more effectively. Information retrieval, data mining and other information-based technologies are receiving increased attention. In this thesis, a framework based on text mining techniques is proposed to discover useful intelligence implicit in large bodies of electronic text sources. This intelligence is a prime requirement for successful R&D management. This research extends the approach called “Technology Opportunities Analysis” (developed by the Technology Policy and Assessment Center, Georgia Institute of Technology, in conjunction with Search Technology, Inc.) to create the proposed framework. The commercialized software, called VantagePoint, is mainly used to perform basic analyses. In addition to utilizing functions in VantagePoint, this thesis also implements a novel text association rule mining algorithm for gathering related concepts among text data. Two algorithms based on text association rule mining are also implemented. The first algorithm called “tree-structured networks” is used to capture important aspects of both parent-child (hierarchical structure) and sibling relations (non-hierarchical structure) among related terms. The second algorithm called “concept-grouping” is used to construct term thesauri for data preprocessing. Finally, the framework is applied to Thai S&T publication of the study can help support strategic decision-making on the direction of S&T programs in Thailand.

[1]  Yongyuth Yuthavong,et al.  Science and technology in Thailand : lessons from a developing economy , 1997 .

[2]  Daryl E. Chubin,et al.  Is citation analysis a legitimate evaluation tool? , 1979, Scientometrics.

[3]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .

[4]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[7]  Douglas W. Oard,et al.  Textual Data Mining to Support Science and Technology Management , 2000, Journal of Intelligent Information Systems.

[8]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[9]  Gerald Salton,et al.  Automatic text processing , 1988 .

[10]  M. Callon,et al.  From translations to problematic networks: An introduction to co-word analysis , 1983 .

[11]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[12]  Anthony F. J. van Raan,et al.  Advanced mapping of science and technology , 2006, Scientometrics.

[13]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[14]  Henry Small,et al.  A Co-Citation Study of AIDS Research , 1989 .

[15]  F. Narin,et al.  Bibliometrics/Theory, Practice and Problems , 1994 .

[16]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[17]  Roger W. Schvaneveldt,et al.  Pathfinder associative networks: studies in knowledge organization , 1990 .

[18]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[19]  Michael E. Porter,et al.  The New Challenge to America's Prosperity: Findings from the Innovation Index , 1999 .

[20]  Xia Lin Map displays for information retrieval , 1997 .

[21]  E. B. Andersen,et al.  Modern factor analysis , 1961 .

[22]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[23]  Alan L. Porter,et al.  Innovation forecasting , 1997, Innovation in Technology Management. The Key to Global Leadership. PICMET '97.

[24]  Eugene Garfield,et al.  THE USE OF CITATION DATA IN WRITING THE HISTORY OF SCIENCE , 1964 .

[25]  Julia Melkers,et al.  Bibliometrics as a Tool for Analysis of R&D Impacts , 1993 .

[26]  Henry Small Visualizing science by citation mapping , 1999 .

[27]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[28]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[29]  Alan L. Porter,et al.  Technology opportunities analysis , 1995 .

[30]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[31]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[32]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[33]  Alan L. Porter,et al.  Research profiling: Improving the literature review , 2002, Scientometrics.

[34]  Arun N. Swami,et al.  Set-oriented mining for association rules in relational databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[35]  Katherine W. McCain,et al.  Visualizing a discipline: an author co-citation analysis of information science, 1972–1995 , 1998 .

[36]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[37]  Erkki Oja,et al.  Engineering applications of the self-organizing map , 1996, Proc. IEEE.

[38]  Alan L. Porter,et al.  Changes in National Technological Competitiveness: 1990, 1993, 1996 and 1999 , 2001, Technol. Anal. Strateg. Manag..

[39]  Chaomei Chen,et al.  Visualizing knowledge domains , 2005, Annu. Rev. Inf. Sci. Technol..

[40]  Bin Chen,et al.  Generating association rules from semi-structured documents using an extended concept hierarchy , 1997, CIKM '97.

[41]  Francis Narin,et al.  Technology indicators based on patents and patent citations , 1988 .

[42]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[43]  Ronald N. Kostoff,et al.  Text mining using database tomography and bibliometrics: A review , 2001 .

[44]  Nils C. Newman,et al.  Measuring national ‘emerging technology’ capabilities , 2002 .

[45]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.