A comparison of taxonomy generation techniques using bibliometric methods : applied to research strategy formulation

This paper investigates the modeling of research landscapes through the automatic generation of hierarchical structures (taxonomies) comprised of terms related to a given research field. Several different taxonomy generation algorithms are discussed and analyzed within this paper, each based on the analysis of a data set of bibliometric information obtained from a credible online publication database. Taxonomy generation algorithms considered include the Dijsktra-Jarnik-Prim‟s (DJP) algorithm, Kruskal‟s algorithm, Edmond‟s algorithm, Heymann algorithm, and the Genetic algorithm. Evaluative experiments are run that attempt to determine which taxonomy generation algorithm would most likely output a taxonomy that is a valid representation of the underlying research landscape. Thesis Co-Supervisor: Stuart Madnick Title: John Norris Maguire Professor of Information Technologies and Professor of Engineering Systems, Massachusetts Institute of Technology Thesis Co-Supervisor: Wei Lee Woon Title: Assistant Professor, Masdar Institute of Science and Technology

[1]  David Sánchez,et al.  Automatic Generation of Taxonomies from the WWW , 2004, PAKM.

[2]  Anthony F. J. van Raan,et al.  Advanced bibliometric methods as quantitative core of peer review based evaluation and foresight exercises , 1996, Scientometrics.

[3]  Hector Garcia-Molina,et al.  Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems , 2006 .

[4]  Wei Lee Woon,et al.  Comparison of generality based algorithm variants for automatic taxonomy generation , 2009, 2009 International Conference on Innovations in Information Technology (IIT).

[5]  Wolfgang Glänzel,et al.  The need for standards in bibliometric research and technology , 2005, Scientometrics.

[6]  Francis Narin,et al.  Bibliometric performance measures , 1996, Scientometrics.

[7]  Henry G. Small,et al.  Tracking and predicting growth areas in science , 2006, Scientometrics.

[8]  K. Debackere,et al.  Measuring Progress and Evolution in Science and Technology - Ii: The Multiple Uses of Technometric Indicators , 2002 .

[9]  Alfonso Valencia,et al.  Automatic ontology construction from the literature. , 2002, Genome informatics. International Conference on Genome Informatics.

[10]  Yoshiko Okubo,et al.  Bibliometric indicators and analysis of research systems : methods and examples , 1997 .

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Wei Lee Woon,et al.  Technological Forecasting – A Review , 2008 .

[13]  Ido Dagan,et al.  Mining Text Using Keyword Distributions , 1998, Journal of Intelligent Information Systems.

[14]  Ronald N. Kostoff,et al.  Fullerene Data Mining Using Bibliometrics and Database Tomography , 2000, J. Chem. Inf. Comput. Sci..

[15]  Stuart E. Madnick,et al.  Bibliometric analysis of distributed generation , 2011 .

[16]  Yoshiko Okubo,et al.  Bibliometric indicators and analysis of research systems , 1997 .

[17]  Alan L. Porter,et al.  Forecasting and Management of Technology , 1991 .

[18]  Ronald N. Kostoff,et al.  Citation mining: Integrating text mining and bibliometrics for research user profiling , 2001, J. Assoc. Inf. Sci. Technol..

[19]  Joseph P. Martino,et al.  Technological forecasting for decision making , 1983 .

[20]  Alan Porter,et al.  How "tech mining" can enhance R&D management , 2008, IEEE Engineering Management Review.

[22]  Stuart E. Madnick,et al.  A framework for technology forecasting and visualization , 2009, 2009 International Conference on Innovations in Information Technology (IIT).

[23]  Stuart Madnick,et al.  Measuring Innovation Using Bibliometric Techniques: The Case of Solar Photovoltaic Industry , 2009 .

[24]  Christopher Smith,et al.  Volume 10 , 2021, Engineering Project Organization Journal.

[25]  Shui-Lung Chuang,et al.  Towards automatic generation of query taxonomy: a hierarchical query clustering approach , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[26]  Joseph P. Martino,et al.  A review of selected recent advances in technological forecasting , 2003 .

[27]  Raghu Krishnapuram,et al.  Automatic Taxonomy Generation: Issues and Possibilities , 2003, IFSA.

[28]  Tugrul U. Daim,et al.  Forecasting emerging technologies: Use of bibliometrics and patent analysis , 2006 .

[29]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[30]  Blaine E Ziegler,et al.  Methods for bibliometric analysis of research : renewable energy case study , 2009 .