Resource management for model learning at entity level

Many current and future applications plan to provide entity-specific predictions. These range from individualized healthcare applications to user-specific purchase recommendations. In our previous stream-based work on Amazon review data, we could show that error-weighted ensembles that combine entity-centric classifiers, which are only trained on reviews of one particular product (entity), and entity-ignorant classifiers, which are trained on all reviews irrespective of the product, can improve prediction quality. This came at the cost of storing multiple entity-centric models in primary memory, many of which would never be used again as their entities would not receive future instances in the stream. To overcome this drawback and make entity-centric learning viable in these scenarios, we investigated two different methods of reducing the primary memory requirement of our entity-centric approach. Our first method uses the lossy counting algorithm for data streams to identify entities whose instances make up a certain percentage of the total data stream within an error-margin. We then store all models which do not fulfil this requirement in secondary memory, from which they can be retrieved in case future instances belonging to them should arrive later in the stream. The second method replaces entity-centric models with a much more naive model which only stores the past labels and predicts the majority label seen so far. We applied our methods on the previously used Amazon data sets which contained up to 1.4M reviews and added two subsets of the Yelp data set which contain up to 4.2M reviews. Both methods were successful in reducing the primary memory requirements while still outperforming an entity-ignorant model.

[1]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[2]  Milos Hauskrecht,et al.  Learning Adaptive Forecasting Models from Irregularly Sampled Multivariate Clinical Data , 2016, AAAI.

[3]  Daniel Rueckert,et al.  Meta-Weighted Gaussian Process Experts for Personalized Forecasting of AD Cognitive Changes , 2019, MLHC.

[4]  Myra Spiliopoulou,et al.  Exploiting entity information for stream classification over a stream of reviews , 2019, SAC.

[5]  Silviu Maniu,et al.  Efficient Batch-Incremental Classification Using UMAP for Evolving Data Streams , 2020, IDA.

[6]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[7]  Jean Paul Barddal,et al.  A survey on feature drift adaptation: Definition, benchmark, challenges and future directions , 2017, J. Syst. Softw..

[8]  Bi-Ru Dai,et al.  An Ensemble Learning Approach for Concept Drift , 2014, 2014 International Conference on Information Science & Applications (ICISA).

[9]  Myra Spiliopoulou,et al.  Ageing-Based Multinomial Naive Bayes Classifiers Over Opinionated Data Streams , 2015, ECML/PKDD.

[10]  Myra Spiliopoulou,et al.  Learning under Feature Drifts in Textual Streams , 2018, CIKM.

[11]  Andreas Spitz,et al.  Exploring Entity-centric Networks in Entangled News Streams , 2018, WWW.

[12]  Ming-Syan Chen,et al.  Adaptive Clustering for Multiple Evolving Streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[13]  Myra Spiliopoulou,et al.  Predicting polarities of entity-centered documents without reading their contents , 2018, SAC.

[14]  Birsen Eygi Erdogan,et al.  A novel approach for panel data: An ensemble of weighted functional margin SVM models , 2019, Inf. Sci..

[15]  Katharina Morik,et al.  A Drift-Based Dynamic Ensemble Members Selection Using Clustering for Time Series Forecasting , 2019, ECML/PKDD.

[16]  PfahringerBernhard,et al.  A survey on feature drift adaptation , 2017 .

[17]  Geoff Holmes,et al.  Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them , 2013, ECML/PKDD.

[18]  Andreas Spitz,et al.  TopExNet: Entity-Centric Network Topic Exploration in News Streams , 2019, WSDM.

[19]  Myra Spiliopoulou,et al.  Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[20]  Myra Spiliopoulou,et al.  Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity , 2019, International Journal of Data Science and Analytics.