A Lossy Counting Based Approach for Learning on Streams of Graphs on a Budget

In many problem settings, for example on graph domains, online learning algorithms on streams of data need to respect strict time constraints dictated by the throughput on which the data arrive. When only a limited amount of memory (budget) is available, a learning algorithm will eventually need to discard some of the information used to represent the current solution, thus negatively affecting its classification performance. More importantly, the overhead due to budget management may significantly increase the computational burden of the learning algorithm. In this paper we present a novel approach inspired by the Passive Aggressive and the Lossy Counting algorithms. Our algorithm uses a fast procedure for deleting the less influential features. Moreover, it is able to estimate the weighted frequency of each feature and use it for prediction.

[1]  Cesare Alippi,et al.  A "Learning from Models" Cognitive Fault Diagnosis System , 2012, ICANN.

[2]  João Gama,et al.  Discretization from data streams: applications to histograms and data mining , 2006, SAC.

[3]  Mary S. Morgan,et al.  Learning from Models , 1997 .

[4]  Li-Yan Yuan Proceedings of the 18th International Conference on Very Large Data Bases , 1992, VLDB 1992.

[5]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[6]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  Geoff Holmes,et al.  Mining frequent closed graphs on evolving data streams , 2011, KDD.

[9]  Raghu Ramakrishnan,et al.  Proceedings : KDD 2000 : the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2000, Boston, MA, USA , 2000 .

[10]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[11]  Ralf Klinkenberg,et al.  An Ensemble Classifier for Drifting Concepts , 2005 .

[12]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[13]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[14]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[15]  Sattar Hashemi,et al.  A graph mining approach for detecting unknown malwares , 2012, J. Vis. Lang. Comput..

[16]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[17]  Hiroki Arimura,et al.  Online algorithms for mining semi-structured data stream , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  Ricard Gavaldà,et al.  Mining adaptively frequent closed unlabeled rooted trees in data streams , 2008, KDD.

[19]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[20]  Geoff Holmes,et al.  Handling Numeric Attributes in Hoeffding Trees , 2008, PAKDD.

[21]  Carlo Zaniolo,et al.  Fast and Light Boosting for Adaptive Mining of Data Streams , 2004, PAKDD.

[22]  Robert L. Scot Drysdale,et al.  A comparison of sequential Delaunay triangulation algorithms , 1995, SCG '95.

[23]  Slobodan Vucetic,et al.  Online Passive-Aggressive Algorithms on a Budget , 2010, AISTATS.

[24]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[25]  Chengqi Zhang,et al.  Nested Subtree Hash Kernels for Large-Scale Graph Classification over Streams , 2012, 2012 IEEE 12th International Conference on Data Mining.

[26]  Fabrizio Costa,et al.  Fast Neighborhood Subgraph Pairwise Distance Kernel , 2010, ICML.

[27]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[28]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[29]  Alessandro Sperduti,et al.  A Tree-Based Kernel for Graphs , 2012, SDM.