Materials Science Literature-Patent Relevance Search: A Heterogeneous Network Analysis Approach

In recent decades, materials science literature and patents have grown exponentially. This has also contributed to an ever-growing challenge whether the literature is current, as there can be a gap between when the patent was filed and when it was approved. Moreover, it is difficult to ensure that a patent cites the appropriate prior art due to variety and volume of materials science data, especially when it is in two separate sources that have different curation mechanisms and purpose – publications and patents. The existing relational database schema, generally used to store publications, also presents challenges given the strict tabular schema, which may not be appropriate for organizing and querying highly interconnected information about materials in these publications and patents. For example, elements are chemically combined to form a compound, which can then be converted to other compounds via chemical reactions. Furthermore, relational database is not designed for handling combining data from multiple sources and with various formats, thus it makes discover relevance between publications and patents become difficult. In order to explore an alternative approach to represent materials data and combine data from multiple sources into the same repository, in this work, we propose a solution to integrate data from Open Quantum Materials Database (OQMD) and patent data from USPTO1 database into a network and named it heterogeneous materials information network (HMIN). We generalize prior work which based on using meta path-based topological features to explore the network, and we propose features to identify network noise and investigate relatedness between different-typed objects to meet our application needs. We built several machine learning models by using these features to explore relevance between materials science publications and patents. Experiment results show that HMIN can help researchers effectively discover related publications and patents originally kept in different sources. Our work exhibits to materials community a new way of appro-priately representing materials data and discovering connections between data from multiple sources.

[1]  A. Choudhary,et al.  Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science , 2016 .

[2]  Muratahan Aykol,et al.  The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies , 2015 .

[3]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Keyword Search on Graph Data , 2010, Managing and Mining Graph Data.

[4]  Adele P. Peskin,et al.  Informatics Infrastructure for the Materials Genome Initiative , 2016 .

[5]  Bin Chen,et al.  Predicting drug target interactions using meta-path-based semantic network analysis , 2016, BMC Bioinformatics.

[6]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[7]  Yizhou Sun,et al.  Recommendation in heterogeneous information networks with implicit user feedback , 2013, RecSys.

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  Daniel P. Miranker,et al.  On directly mapping relational databases to RDF and OWL , 2012, WWW.

[10]  Bin Li,et al.  Fast Graph Stream Classification Using Discriminative Clique Hashing , 2013, PAKDD.

[11]  Jim Webber,et al.  Graph Databases: New Opportunities for Connected Data , 2015 .

[12]  Gang Hu,et al.  SQLGraph: An Efficient Relational-Based Property Graph Store , 2015, SIGMOD Conference.

[13]  Roberto De Virgilio,et al.  Converting relational to graph databases , 2013, GRADES.

[14]  Lawrence B. Holder,et al.  Scalable SVM-Based Classification in Dynamic Graphs , 2014, 2014 IEEE International Conference on Data Mining.

[15]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[16]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[17]  Yuzhong Qu,et al.  Discovering Simple Mappings Between Relational Database Schemas and Ontologies , 2007, ISWC/ASWC.

[18]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[19]  Marko A. Rodriguez,et al.  The Graph Traversal Pattern , 2010, Graph Data Management.

[20]  Philip S. Yu,et al.  Relevance search in heterogeneous networks , 2012, EDBT '12.

[21]  Ashley A. White Big data are shaping the future of materials science , 2013 .

[22]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[23]  Philip S. Yu,et al.  HeteRecom: a semantic-based recommendation system in heterogeneous networks , 2012, KDD.

[24]  Jiawei Han,et al.  Mining hidden community in heterogeneous social networks , 2005, LinkKDD '05.

[25]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[26]  L. Bush,et al.  Discovering Meta-Paths in Large Heterogeneous Information Networks , 2015 .

[27]  Surya R. Kalidindi,et al.  Materials Data Science: Current Status and Future Outlook , 2015 .

[28]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[29]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[30]  Yueting Zhuang,et al.  Community-Based Question Answering via Heterogeneous Social Network Learning , 2016, AAAI.