From Scattered Sources to Comprehensive Technology Landscape: A Recommendation-based Retrieval Approach

Mapping the technology landscape is crucial for market actors to take informed investment decisions. However, given the large amount of data on the Web and its subsequent information overload, manually retrieving information is a seemingly ineffective and incomplete approach. In this work, we propose an end-to-end recommendation based retrieval approach to support automatic retrieval of technologies and their associated companies from raw Web data. This is a two-task setup involving (i) technology classification of entities extracted from company corpus, and (ii) technology and company retrieval based on classified technologies. Our proposed framework approaches the first task by leveraging DistilBERT which is a state-of-the-art language model. For the retrieval task, we introduce a recommendation-based retrieval technique to simultaneously support retrieving related companies, technologies related to a specific company and companies relevant to a technology. To evaluate these tasks, we also construct a data set that includes company documents and entities extracted from these documents together with company categories and technology labels. Experiments show that our approach is able to return 4 times more relevant companies while outperforming traditional retrieval baseline in retrieving technologies.

[1]  Denis Loveridge,et al.  FTA as Due Diligence for an Era of Accelerated Interdiction by an Algorithm-Big Data Duo , 2016 .

[2]  Tao Han,et al.  Disruptive Technology Forecasting based on Gartner Hype Cycle , 2019, 2019 IEEE Technology & Engineering Management Conference (TEMSCON).

[3]  Charu C. Aggarwal,et al.  Information Retrieval and Search Engines , 2018 .

[4]  Wlodek Zadrozny,et al.  Patent retrieval: a literature review , 2017, Knowledge and Information Systems.

[5]  Hongfang Liu,et al.  Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts , 2017, Database J. Biol. Databases Curation.

[6]  Tariq Mahmood,et al.  Security Analytics: Big Data Analytics for cybersecurity: A review of trends, techniques and tools , 2013, 2013 2nd National Conference on Information Assurance (NCIA).

[7]  Pall Rikhardsson,et al.  Business intelligence & analytics in management accounting research: Status and future focus , 2018, Int. J. Account. Inf. Syst..

[8]  Alexey Mikheev,et al.  Ontology-based Data Access for Energy Technology Forecasting , 2018, CloudCom 2018.

[9]  Stuart E. Madnick,et al.  Semantic distances for technology landscape visualization , 2012, Journal of Intelligent Information Systems.

[10]  Umut Durak,et al.  Flight 4.0: The Changing Technology Landscape of Aeronautics , 2018, Advances in Aeronautical Informatics.

[11]  Kohei Arai,et al.  Extraction of Keywords for Retrieval from Paper Documents and Drawings based on the Method of Determining the Importance of Knowledge by the Analytic Hierarchy Process: AHP , 2020 .

[12]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[13]  Fangcheng Tang,et al.  Applying semantic web into technology forecasting in enterprises , 2008, 2008 IEEE International Conference on Service Operations and Logistics, and Informatics.

[14]  Jimmy J. Lin,et al.  Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search , 2018, AAAI.

[15]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[16]  Nashwa Abdelbaki,et al.  Data-Driven Information Filtering Framework for Dynamically Hybrid Job Recommendation , 2021 .

[17]  John Shalf,et al.  The future of computing beyond Moore’s Law , 2020, Philosophical Transactions of the Royal Society A.

[18]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[19]  Parminder Kaur,et al.  Comparative analysis on cross-modal information retrieval: A review , 2021, Comput. Sci. Rev..

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Youngjin Park,et al.  Generating patent development maps for technology monitoring using semantic patent-topic analysis , 2016, Comput. Ind. Eng..

[22]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[23]  Alan L. Porter,et al.  Anticipating Future Innovation Pathways Through Large Data Analysis , 2016 .

[24]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[25]  John D. Kelleher,et al.  TEST: A Terminology Extraction System for Technology Related Terms , 2019, ICCAE 2019.

[26]  P. Roetzel,et al.  Information overload in the information age: a review of the literature from business administration, business psychology, and related disciplines with a bibliometric approach and framework development , 2018, Business Research.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Stefan Baerisch Information Retrieval and Digital Libraries , 2010 .

[29]  Kamran Munir,et al.  The use of ontologies for effective knowledge modelling and information retrieval , 2018, Applied Computing and Informatics.

[30]  Andrzej Kraslawski,et al.  Application of semantic and lexical analysis to technology forecasting by trend analysis - thematic clusters in separation processes , 2012 .

[31]  Wolfgang Nejdl,et al.  A Vector Space Model for Ranking Entities and Its Application to Expert Search , 2009, ECIR.

[32]  Fefie Dotsika,et al.  Identifying potentially disruptive trends by means of keyword network analysis , 2017 .

[33]  Melissa A. Schilling,et al.  Mapping the Technological Landscape: Measuring Technology Distance, Technological Footprints, and Technology Evolution , 2016 .

[34]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.