A New Recommendation Approach Based on Probabilistic Soft Clustering Methods: A Scientific Documentation Case Study

Recommender system (RS) clustering is an important issue, both for the improvement of the collaborative filtering (CF) accuracy and to obtain analytical information from their high sparse datasets. RS items and users usually share features belonging to different clusters, e.g., a musical-comedy movie. Soft clustering, therefore, is the CF clustering’s most natural approach. In this paper, we propose a new prediction approach for probabilistic soft clustering methods. In addition, we put to test a not traditional scientific documentation CF dataset: SD4AI, and we compare results with the MovieLens baseline. Not traditional CF datasets have challenging features, such as not regular rating frequency distributions, broad range of rating values, and a particularly high sparsity. The results show the suitability of using soft-clustering approaches, where their probabilistic overlapping parameters find optimum values when balanced hard/soft clustering is used. This paper opens some promising lines of research, such as RSs’ use in the scientific documentation field, the Internet of Things-based datasets processing, and design of new model-based soft clustering methods.

[1]  Guillermo Glez. de Rivera,et al.  A similarity metric designed to speed up, using hardware, the recommender systems k-nearest neighbors algorithm , 2013, Knowl. Based Syst..

[2]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[3]  Fernando Ortega,et al.  Recommendation to Groups of Users Using the Singularities Concept , 2018, IEEE Access.

[4]  Mouzhi Ge,et al.  Beyond accuracy: evaluating recommender systems by coverage and serendipity , 2010, RecSys '10.

[5]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[6]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[7]  Guillaume Bouchard,et al.  Robust Bayesian Matrix Factorisation , 2011, AISTATS.

[8]  Cosimo Birtolo,et al.  Advances in Clustering Collaborative Filtering by means of Fuzzy C-means and trust , 2013, Expert Syst. Appl..

[9]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[10]  Charu C. Aggarwal,et al.  Data Clustering: Algorithms and Applications , 2014 .

[11]  Yang Gao,et al.  Incremental Nonnegative Matrix Factorization Based on Matrix Sketching and k-means Clustering , 2016, IDEAL.

[12]  Kourosh Kiani,et al.  User based Collaborative Filtering using fuzzy C-means , 2016 .

[13]  Xiwei Wang,et al.  Using incremental clustering technique in collaborative filtering data update , 2014, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014).

[14]  Elmar Wolfgang Lang,et al.  A new Bayesian approach to nonnegative matrix factorization: Uniqueness and model order selection , 2014, Neurocomputing.

[15]  Seungjin Choi,et al.  Weighted Nonnegative Matrix Co-Tri-Factorization for Collaborative Prediction , 2009, ACML.

[16]  Wang Zhe,et al.  Two-Phase Collaborative Filtering Algorithm Based on Co-Clustering , 2010 .

[17]  Xin-Lin Huang,et al.  Non-informative hierarchical Bayesian inference for non-negative matrix factorization , 2015, Signal Process..

[18]  Fernando Ortega,et al.  CF4J: Collaborative filtering for Java , 2018, Knowl. Based Syst..

[19]  K. A. Vidhya,et al.  Rough set theory for document clustering: A review , 2017, J. Intell. Fuzzy Syst..

[20]  Krzysztof Siminski Fuzzy weighted C-ordered means clustering algorithm , 2017, Fuzzy Sets Syst..

[21]  Haesun Park,et al.  Sparse Nonnegative Matrix Factorization for Clustering , 2008 .

[22]  Nicoletta Del Buono,et al.  Non-negative Matrix Tri-Factorization for co-clustering: An analysis of the block matrix , 2015, Inf. Sci..

[23]  Fernando Ortega,et al.  A non negative matrix factorization for collaborative filtering recommender systems based on a Bayesian probabilistic model , 2016, Knowl. Based Syst..

[24]  Fernando Ortega,et al.  Recommending items to group of users using Matrix Factorization based Collaborative Filtering , 2016, Inf. Sci..

[25]  Jianxun Liu,et al.  ClubCF: A Clustering-Based Collaborative Filtering Approach for Big Data Application , 2014, IEEE Transactions on Emerging Topics in Computing.

[26]  Uday V. Kulkarni,et al.  Hybrid personalized recommender system using centering-bunching based clustering algorithm , 2012, Expert Syst. Appl..

[27]  George Grekousis,et al.  Comparison of two fuzzy algorithms in geodemographic segmentation analysis: The Fuzzy C-Means and Gustafson–Kessel methods , 2012 .

[28]  Qinghua Zheng,et al.  Probabilistic Non-Negative Matrix Factorization and Its Robust Extensions for Topic Modeling , 2017, AAAI.

[29]  Yingda Lv,et al.  A novel automatic fuzzy clustering algorithm based on soft partition and membership information , 2017, Neurocomputing.

[30]  Simon J. Godsill,et al.  Bayesian extensions to non-negative matrix factorisation for audio signal modelling , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Jesús Bobadilla,et al.  The Effect of Sparsity on Collaborative Filtering Metrics , 2009, ADC.

[32]  Surya Kant,et al.  Nearest biclusters collaborative filtering framework with fusion , 2017, J. Comput. Sci..

[33]  Debajyoti Mukhopadhyay,et al.  Matrix Factorization Model in Collaborative Filtering Algorithms: A Survey , 2015 .

[34]  Jiebo Luo,et al.  Constrained Clustering With Nonnegative Matrix Factorization , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[36]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Haesun Park,et al.  A high-performance parallel algorithm for nonnegative matrix factorization , 2015, PPoPP.

[38]  Da Kuang,et al.  Nonnegative matrix factorization for clustering , 2014 .

[39]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[40]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[41]  Adam Prügel-Bennett,et al.  Novel centroid selection approaches for KMeans-clustering based recommender systems , 2015, Inf. Sci..

[42]  Miin-Shen Yang,et al.  Alternative c-means clustering algorithms , 2002, Pattern Recognit..

[43]  Jack J. Dongarra,et al.  Fast Cholesky factorization on GPUs for batch and native modes in MAGMA , 2017, J. Comput. Sci..

[44]  Katsuhiro Honda,et al.  Xie-Beni-Type Fuzzy Cluster Validation in Fuzzy Co-clustering of Documents and Keywords , 2014, Soft Computing in Artificial Intelligence.

[45]  Han Xin-jie The Fuzzy C-Means Clustering Algorithm and Its Application in the Fault Diagnosis of Ships , 2007 .

[46]  Bela Gipp,et al.  Research-paper recommender systems: a literature survey , 2015, International Journal on Digital Libraries.

[47]  Martin Graham,et al.  A Survey of Multiple Tree Visualisation , 2010, Inf. Vis..

[48]  Zhaohong Deng,et al.  A survey on soft subspace clustering , 2014, Inf. Sci..

[49]  Chuang Liu,et al.  Information Filtering via Collaborative User Clustering Modeling , 2013, ArXiv.

[50]  Fernando Ortega,et al.  Artificial Intelligence Scientific Documentation Dataset for Recommender Systems , 2018, IEEE Access.

[51]  Xiaokun Yang,et al.  FHCC: A SOFT HIERARCHICAL CLUSTERING APPROACH FOR COLLABORATIVE FILTERING RECOMMENDATION , 2016 .

[52]  Mohammad Hasanzadeh-Mofrad,et al.  Learning Automata Clustering , 2017, J. Comput. Sci..

[53]  Fernando Ortega,et al.  Using Hierarchical Graph Maps to Explain Collaborative Filtering Recommendations , 2014, Int. J. Intell. Syst..