Positive unlabeled learning for building recommender systems in a parliamentary setting

Abstract Our goal is to learn about the political interests and preferences of Members of Parliament (MPs) by mining their parliamentary activity in order to develop a recommendation/filtering system to determine how relevant documents should be distributed among MPs. We propose the use of positive unlabeled learning to tackle this problem since we only have information about relevant documents (the interventions of each MP in debates) but not about irrelevant documents and so it is not possible to use standard binary classifiers which have been trained with positive and negative examples. Additionally, we have also developed a new positive unlabeled learning algorithm that compares favorably with: (a) a baseline approach which assumes that every intervention by any other MP is irrelevant; (b) another well-known positive unlabeled learning method; and (c) an approach based on information retrieval methods that matches documents and legislators’ representations. The experiments have been conducted with data from the regional Spanish Andalusian Parliament.

[1]  Peng Shi,et al.  Learning very fast decision tree from uncertain data streams with positive and unlabeled samples , 2012, Inf. Sci..

[2]  Annalina Caputo,et al.  Concept-based item representations for a cross-lingual content-based recommendation process , 2016, Inf. Sci..

[3]  Luis M. de Campos,et al.  An integrated system for managing the Andalusian Parliament's digital library , 2009, Program.

[4]  Peretz Shoval,et al.  Information Filtering: Overview of Issues, Research and Systems , 2001, User Modeling and User-Adapted Interaction.

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  Luis M. de Campos,et al.  A Lazy Approach for Filtering Parliamentary Documents , 2015, EGOVIS.

[7]  Richard Weber,et al.  Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines , 2014, Inf. Sci..

[8]  Iñaki Inza,et al.  Learning from Proportions of Positive and Unlabeled Examples , 2017, Int. J. Intell. Syst..

[9]  Michael J. Pazzani,et al.  Learning and Revising User Profiles: The Identification of Interesting Web Sites , 1997, Machine Learning.

[10]  Wanli Zuo,et al.  Learning from Positive and Unlabeled Examples: A Survey , 2008, 2008 International Symposiums on Information Processing.

[11]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[12]  Gang Niu,et al.  Class-prior estimation for learning from positive and unlabeled data , 2016, Machine Learning.

[13]  Bing Liu,et al.  Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.

[14]  Michael J. Pazzani,et al.  A learning agent for wireless news access , 2000, IUI '00.

[15]  Enrico Francesconi,et al.  Electronic Government and the Information Systems Perspective , 2016, Lecture Notes in Computer Science.

[16]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[17]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[18]  Pedro Larrañaga,et al.  Learning Bayesian classifiers from positive and unlabeled examples , 2007, Pattern Recognit. Lett..

[19]  LoebShoshana Architecting personalized delivery of multimedia information , 1992 .

[20]  Shoshana Loeb,et al.  Architecting personalized delivery of multimedia information , 1992, CACM.

[21]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[22]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.

[23]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[24]  Adam Prügel-Bennett,et al.  Novel centroid selection approaches for KMeans-clustering based recommender systems , 2015, Inf. Sci..

[25]  Michael J. Shaw,et al.  Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts , 2001, Int. J. Electron. Commer..

[26]  Luis M. de Campos,et al.  Concept profiles for filtering parliamentary documents , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[27]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[28]  Rémi Gilleron,et al.  Text Classification from Positive and Unlabeled Examples , 2002 .

[29]  J. Shahin,et al.  ‘Connecting Europe’: The Use of ‘New’ Information and Communication Technologies within European Parliament Standing Committees , 2007 .

[30]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[31]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[32]  A Min Tjoa,et al.  Applying evolutionary algorithms to the problem of information filtering , 1997, Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings.

[33]  Brett Lantz,et al.  Machine learning with R : learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications , 2013 .

[34]  Andrew Jennings,et al.  A user model neural network for a personal news service , 1993, User Modeling and User-Adapted Interaction.

[35]  Wei Wang,et al.  Recommender system application developments: A survey , 2015, Decis. Support Syst..

[36]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[37]  Philip S. Yu,et al.  Text classification without negative examples revisit , 2006, IEEE Transactions on Knowledge and Data Engineering.

[38]  Yang Zhang,et al.  Bayesian belief network for positive unlabeled learning with uncertainty , 2017, Pattern Recognit. Lett..

[39]  Luis M. de Campos,et al.  Profile-based recommendation: A case study in a parliamentary context , 2017, J. Inf. Sci..

[40]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .