Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity

Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k -nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

[1]  Alicia Troncoso Lora,et al.  Influence of kNN-Based Load Forecasting Errors on Optimal Energy Production , 2003, EPIA.

[2]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[3]  Richard N. Clarke,et al.  SICs as Delineators of Economic Markets , 1989 .

[4]  Saman Haratizadeh,et al.  A hybrid supervised semi-supervised graph-based model to predict one-day ahead movement of global stock markets and commodity prices , 2018, Expert Syst. Appl..

[5]  Amy Loutfi,et al.  A review of unsupervised feature learning and deep learning for time-series modeling , 2014, Pattern Recognit. Lett..

[6]  Ruibin Zhang,et al.  Referential kNN Regression for Financial Time Series Forecasting , 2013, ICONIP.

[7]  S. Yakowitz NEAREST‐NEIGHBOUR METHODS FOR TIME SERIES ANALYSIS , 1987 .

[8]  L MinkuLeandro,et al.  Ensemble learning for data stream analysis , 2017 .

[9]  Sudipto Guha,et al.  Semi-Supervised Learning on Data Streams via Temporal Label Propagation , 2018, ICML.

[10]  Nicholas G. Polson,et al.  Deep learning for short-term traffic flow prediction , 2016, 1604.04527.

[11]  Peter Szolovits,et al.  Clinical Intervention Prediction and Understanding with Deep Neural Networks , 2017, MLHC.

[12]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[13]  Robi Polikar,et al.  COMPOSE: A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[15]  Julian J. McAuley,et al.  Addressing Complex and Subjective Product-Related Queries with Customer Reviews , 2015, WWW.

[16]  Yu Zheng,et al.  Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction , 2016, AAAI.

[17]  Myra Spiliopoulou,et al.  Prospective crowdsensing versus retrospective ratings of tinnitus variability and tinnitus–stress associations based on the TrackYourTinnitus mobile platform , 2019, International Journal of Data Science and Analytics.

[18]  Wolfgang Lehner,et al.  CSAR: the cross-sectional autoregression model for short and long-range forecasting , 2019, International Journal of Data Science and Analytics.

[19]  Eirini Ntoutsi,et al.  Large Scale Sentiment Learning with Limited Labels , 2017, KDD.

[20]  J. Ramos,et al.  Electricity Market Price Forecasting Based on Weighted Nearest Neighbors Techniques , 2007, IEEE Transactions on Power Systems.

[21]  W. Hiller,et al.  When Tinnitus Loudness and Annoyance Are Discrepant: Audiological Characteristics and Psychological Profile , 2007, Audiology and Neurotology.

[22]  Fahad H. Al-Qahtani,et al.  Multivariate k-nearest neighbour regression for time series data — A novel algorithm for forecasting UK electricity demand , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[23]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[24]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[25]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[26]  Alicia Troncoso Lora,et al.  Time-Series Prediction: Application to the Short-Term Electric Energy Demand , 2003, CAEPIA.

[27]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[28]  Alicia Troncoso Lora,et al.  Electricity Market Price Forecasting: Neural Networks versus Weighted-Distance k Nearest Neighbours , 2002, DEXA.

[29]  Myra Spiliopoulou,et al.  Active Stream Learning with an Oracle of Unknown Availability for Sentiment Prediction , 2018, IAL@PKDD/ECML.

[30]  Myra Spiliopoulou,et al.  Predicting polarities of entity-centered documents without reading their contents , 2018, SAC.