Text mining using database tomography and bibliometrics: A review

Abstract Database tomography (DT) is a textual database analysis system consisting of two major components: (1) algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment (2) interpretative capabilities of the expert human analyst. DT has been used to derive technical intelligence from a variety of textual database sources, most recently the published technical literature as exemplified by the Science Citation Index (SCI) and the Engineering Compendex (EC). Phrase frequency analysis (the occurrence frequency of multiword technical phrases) provides the pervasive technical themes of the topical databases of interest, and phrase proximity analysis provides the relationships among the pervasive technical themes. In the structured published literature databases, bibliometric analysis of the database records supplements the DT results by identifying: the recent most prolific topical area authors; the journals that contain numerous topical area papers; the institutions that produce numerous topical area papers; the keywords specified most frequently by the topical area authors; the authors whose works are cited most frequently in the topical area papers; and the particular papers and journals cited most frequently in the topical area papers. This review paper summarizes: (1) the theory and background development of DT; (2) past published and unpublished literature study results; (3) present application activities; (4) potential expansion to new DT applications. In addition, application of DT to technology forecasting is addressed.

[1]  N R Smalheiser,et al.  Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. , 1998, Computer methods and programs in biomedicine.

[2]  D. Swanson,et al.  Calcium-independent phospholipase A2 and schizophrenia. , 1998, Archives of general psychiatry.

[3]  Ronald N. Kostoff,et al.  Science and technology innovation , 1999 .

[4]  Neil R. Smalheiser,et al.  Assessing a gap in the biomedical literature: Magnesium deficiency and neurologic disease , 1994 .

[5]  Oren Etzioni,et al.  Clustering web documents: a phrase-based method for grouping search engine results , 1999 .

[6]  Ronald N. Kostoff,et al.  Science and technology roadmaps , 2001, IEEE Trans. Engineering Management.

[7]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[8]  Ronald N. Kostoff,et al.  Database tomography for technical intelligence , 1993 .

[9]  Ronald N. Kostoff,et al.  Database Tomography for Technical Intelligence: A Roadmap of the Near-Earth Space Science and Technology Literature , 1998, Inf. Process. Manag..

[10]  Ronald N. Kostoff,et al.  Hypersonic and supersonic flow roadmaps using bibliometrics and database tomography , 1999 .

[11]  Ronald N. Kostoff,et al.  Database tomography for information retrieval , 1997, J. Inf. Sci..

[12]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[13]  Daryl E. Chubin,et al.  Research Impact Assessment , 1993 .

[14]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[15]  S. Bradford "Sources of information on specific subjects" by S.C. Bradford , 1985 .

[16]  Ronald N. Kostoff,et al.  Database tomography applied to an aircraft science and technology investment strategy , 2000 .