EncyCatalogRec: catalog recommendation for encyclopedia article completion

Online encyclopedias such as Wikipedia provide a large and growing number of articles on many topics. However, the content of many articles is still far from complete. In this paper, we propose EncyCatalogRec, a system to help generate a more comprehensive article by recommending catalogs. First, we represent articles and catalog items as embedding vectors, and obtain similar articles via the locality sensitive hashing technology, where the items of these articles are considered as the candidate items. Then a relation graph is built from the articles and the candidate items. This is further transformed into a product graph. So, the recommendation problem is changed to a transductive learning problem in the product graph. Finally, the recommended items are sorted by the learning-to-rank technology. Experimental results demonstrate that our approach achieves state-of-the-art performance on catalog recommendation in both warm- and cold-start scenarios. We have validated our approach by a case study.

[1]  Maarten de Rijke,et al.  Mining, Ranking and Recommending Entity Aspects , 2015, SIGIR.

[2]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[3]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[4]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[5]  Regina Barzilay,et al.  Automatically Generating Wikipedia Articles: A Structure-Aware Approach , 2009, ACL.

[6]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[7]  Ellen Riloff,et al.  Creating a Mars Target Encyclopedia by Extracting Information from the Planetary Science Literature , 2016, AAAI Workshop: Knowledge Extraction from Text.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Yiming Yang,et al.  Bipartite Edge Prediction via Transductive Learning over Product Graphs , 2015, ICML.

[10]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[11]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[12]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[13]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[14]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Jure Leskovec,et al.  Growing Wikipedia Across Languages via Recommendation , 2016, WWW.

[17]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[18]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[19]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[20]  Prasenjit Mitra,et al.  Filling the Gaps: Improving Wikipedia Stubs , 2015, DocEng.

[21]  Naoaki Okazaki,et al.  Learning Web Query Patterns for Imitating Wikipedia Articles , 2010, COLING.

[22]  Prasenjit Mitra,et al.  WikiWrite: Generating Wikipedia Articles Automatically , 2016, IJCAI.

[23]  Prasenjit Mitra,et al.  WikiKreator: Improving Wikipedia Stubs Automatically , 2015, ACL.

[24]  MengChu Zhou,et al.  An Efficient Non-Negative Matrix-Factorization-Based Approach to Collaborative Filtering for Recommender Systems , 2014, IEEE Transactions on Industrial Informatics.

[25]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[26]  Avishek Anand,et al.  Automated News Suggestions for Populating Wikipedia Entity Pages , 2015, CIKM.