How to measure the semantic similarities between scientific papers and patents in order to discover uncommercialized research fronts: A case study of solar cells

In this paper, we perform a comparative study to measure the semantic similarity between academic papers and patents. Research fronts which do not correspond any patents can be uncommercialized and opportunities for industry. Therefore it is significant to investigate the relationship between the scientific outcomes and the pieces of industrial technology. We compare structures of citation network of scientific publications with those of patents by citation analysis, measure the similarity between sets of academic papers and ones of patents by natural language processing, and discuss the validity of the results with experts. After the documents (papers/patents) in each layer are categorized by a citation-based method, we compare three semantic similarity measurements between a set of academic papers and a set of patents: Jaccard coefficient, cosine similarity of tfidf vector, and cosine similarity of log-tfidf vector. A case study is performed in solar cells to develop a method investigating the corresponding relationship between papers and patents. As a result, the cosine similarity of tfidf is the best way to discover the corresponding relationship. This proposed approach enables us to obtain, at least, the candidates of unexplored research fronts, where academic researches exist but patents do not.

[1]  M. Meyer Does science push technology? Patents citing scientific literature , 2000 .

[2]  Jorge Niosi Fourth-Generation R&D: From Linear Models to Flexible Innovation , 1999 .

[3]  D. Edge,et al.  The social shaping of technology , 1988 .

[4]  O. Sorenson,et al.  Science as a Map in Technological Search , 2000 .

[5]  Naoki Shibata,et al.  Comparative study on methods of detecting research fronts using different types of citation , 2009, J. Assoc. Inf. Sci. Technol..

[6]  Manfred Fischedick,et al.  Towards sustainable energy systems: The related role of hydrogen , 2006 .

[7]  George Karypis,et al.  Power source roadmaps using bibliometrics and database tomography , 2005 .

[8]  Edwin Mansfield,et al.  Academic research and industrial innovation , 1991 .

[9]  Francis Narin,et al.  Status report: Linkage between technology and science , 1992 .

[10]  Kimberly S. Hamilton,et al.  The increasing linkage between U.S. technology and public science , 1997 .

[11]  Yoshiyuki Takeda,et al.  Tracking emerging technologies in energy research : toward a roadmap for sustainable energy , 2008 .

[12]  Yoshiyuki Takeda,et al.  Detecting emerging research fronts based on topological measures in citation networks of scientific publications , 2008 .

[13]  Jacques Michel,et al.  Patent citation analysis.A closer look at the basic input data from patent search reports , 2001, Scientometrics.

[14]  Ronald N. Kostoff,et al.  Science and technology roadmaps , 2001, IEEE Trans. Engineering Management.

[15]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  D. Mowery,et al.  Strategic alliances and interfirm knowledge transfer , 1996 .

[17]  Henry G. Small,et al.  Tracking and predicting growth areas in science , 2006, Scientometrics.

[18]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.