Mining unstructured content for recommender systems: an ensemble approach

Recommendation of textual documents requires indexing mechanisms to extract structured metadata for attribute-aware recommender systems. Applying a variety of text mining algorithms has the advantage of capturing different aspects of unstructured content, resulting in richer descriptions. However, it is difficult to integrate them into a unique model so that these descriptions can efficiently improve recommendation accuracy. This article proposes a generic model based on ensemble learning that combines simple text mining methods in a post-processing approach. After executing each text mining technique, each set of metadata of a particular type is applied to the recommender module, which generates attribute-specific rankings. Then, the resulting recommendations are ensembled to generate a final personalized ranking to the user. We evaluated our ensemble technique with two attribute-aware collaborative recommenders (k-Nearest Neighbors and BPR-Mapping) and we demonstrate its generality by means of comparisons among different types of ensembles. We used two datasets from different domains, the first is from the Brazilian Embrapa Agency of Technology Information website, whose documents are written in Portuguese language, and the second is the HetRec MovieLens 2k, published by the GroupLens Research Group, whose movies’ storylines are written in English. The experiments show that, particularly to the k-NN recommender, better accuracy can be obtained when multiple metadata types are combined. The proposed approach is extensible and flexible to new indexing and recommendation techniques.

[1]  Yiu-Kai Ng,et al.  Predicting the ratings of multimedia items for making personalized recommendations , 2012, SIGIR '12.

[2]  George Karypis,et al.  A Comprehensive Survey of Neighborhood-based Recommendation Methods , 2011, Recommender Systems Handbook.

[3]  Guy Shani,et al.  Improving Simple Collaborative Filtering Models Using Ensemble Methods , 2013, MCS.

[4]  Yehuda Koren,et al.  Factor in the neighbors: Scalable and accurate collaborative filtering , 2010, TKDD.

[5]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[6]  Marcos A. Domingues,et al.  Applying multi-view based metadata in personalized ranking for recommender systems , 2015, SAC.

[7]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[8]  Bamshad Mobasher,et al.  Context-Aware Recommendation Based On Review Mining , 2011, ITWP@IJCAI.

[9]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[10]  Nuno Cardoso Rembrandt - a named-entity recognition framework , 2012, LREC.

[11]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[12]  Yi Zhang,et al.  Contextual Recommendation based on Text Mining , 2010, COLING.

[13]  Thiago Alexandre Salgueiro Pardo,et al.  A survey of automatic term extraction for Brazilian Portuguese , 2013, Journal of the Brazilian Computer Society.

[14]  Lars Schmidt-Thieme,et al.  Learning Attribute-to-Feature Mappings for Cold-Start Recommendations , 2010, 2010 IEEE International Conference on Data Mining.

[15]  Jinwoo Hong,et al.  MovieMine: personalized movie content search by utilizing user comments , 2012, IEEE Transactions on Consumer Electronics.

[16]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[17]  Tsvi Kuflik,et al.  Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011) : 27th October 2011, Chicago, IL, USA , 2011 .

[18]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[19]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[20]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[21]  Tsvi Kuflik,et al.  Workshop on information heterogeneity and fusion in recommender systems (HetRec 2010) , 2010, RecSys '10.

[22]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[23]  J Allan,et al.  Readings in information retrieval. , 1998 .

[24]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[25]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[26]  Pasquale Lops,et al.  Knowledge infusion into content-based recommender systems , 2009, RecSys '09.

[27]  Marcelo G. Manzato,et al.  Generating Recommendations Based on Robust Term Extraction from Users' Reviews , 2014, WebMedia.

[28]  Ioannis Korkontzelos,et al.  Reviewing and Evaluating Automatic Term Recognition Techniques , 2008, GoTAL.

[29]  Heiko Paulheim,et al.  A Hybrid Multi-strategy Recommender System Using Linked Open Data , 2014, SemWebEval@ESWC.

[30]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[31]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[32]  George Karypis,et al.  A Comprehensive Survey of Neighborhood-based Recommendation Methods , 2011, Recommender Systems Handbook.

[33]  Marcelo G. Manzato,et al.  Improving Personalized Ranking in Recommender Systems with Topic Hierarchies and Implicit Feedback , 2014, 2014 22nd International Conference on Pattern Recognition.

[34]  Marcelo G. Manzato,et al.  Exploiting Text Mining Techniques for Contextual Recommendations , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[35]  Robert A. Legenstein,et al.  Combining predictions for accurate recommender systems , 2010, KDD.

[36]  Amélie Marian,et al.  Improving the quality of predictions using textual information in online user reviews , 2013, Inf. Syst..

[37]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[38]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[39]  S. Sekine Named Entity : History and Future , 2004 .

[40]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[41]  Ricardo M. Marcacini,et al.  On the Use of Consensus Clustering for Incremental Learning of Topic Hierarchies , 2012, SBIA.

[42]  Solange O. Rezende,et al.  Applying transductive learning for automatic term extraction: The case of the ecology domain , 2013, 2013 Second International Conference on Informatics & Applications (ICIA).