Online Learning of k-CNF Boolean Functions

This paper revisits the problem of learning a k-CNF Boolean function from examples, for fixed k, in the context of online learning under the logarithmic loss. We give a Bayesian interpretation to one of Valiant's classic PAC learning algorithms, which we then build upon to derive three efficient, online, probabilistic, supervised learning algorithms for predicting the output of an unknown k-CNF Boolean function. We analyze the loss of our methods, and show that the cumulative log-loss can be upper bounded by a polynomial function of the size of each example.

[1]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[2]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[3]  F. Willems,et al.  Live-and-die coding for binary piecewise i.i.d. sources , 1997, Proceedings of IEEE International Symposium on Information Theory.

[4]  Marc G. Bellemare,et al.  Compress and Control , 2015, AAAI.

[5]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[6]  N. Merhav,et al.  Low complexity sequential lossless coding for piecewise stationary memoryless sources , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[7]  Frans M. J. Willems,et al.  Coding for a binary independent piecewise-identically-distributed source , 1996, IEEE Trans. Inf. Theory.

[8]  Ieee Staff,et al.  2014 IEEE International Symposium on Information Theory (ISIT) , 2014 .

[9]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[10]  Joel Veness,et al.  Context Tree Switching , 2011, 2012 Data Compression Conference.

[11]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[12]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[13]  Wouter M. Koolen,et al.  Putting Bayes to sleep , 2012, NIPS.

[14]  Joel Veness,et al.  On Ensemble Techniques for AIXI Approximation , 2012, AGI.

[15]  Salil P. Vadhan,et al.  The Complexity of Counting in Sparse, Regular, and Planar Graphs , 2002, SIAM J. Comput..

[16]  Blaz Zupan,et al.  Spam Filtering Using Statistical Data Compression Models , 2006, J. Mach. Learn. Res..

[17]  Neri Merhav,et al.  Low-complexity sequential lossless coding for piecewise-stationary memoryless sources , 1998, IEEE Trans. Inf. Theory.

[18]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[19]  Tamás Linder,et al.  Efficient Tracking of Large Classes of Experts , 2011, IEEE Transactions on Information Theory.

[20]  Gregory F. Cooper,et al.  Exact model averaging with naive Bayesian classifiers , 2002, ICML.

[21]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[22]  Y. Shtarkov,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[23]  Martha White,et al.  Partition Tree Weighting , 2012, 2013 Data Compression Conference.

[24]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[25]  Ian H. Witten,et al.  Text categorization using compression models , 2000, Proceedings DCC 2000. Data Compression Conference.

[26]  Nello Cristianini,et al.  Neural Information Processing Systems (NIPS) , 2003 .

[27]  Christopher Mattern,et al.  Linear and Geometric Mixtures - Analysis , 2013, 2013 Data Compression Conference.

[28]  Joel Veness,et al.  A Monte-Carlo AIXI Approximation , 2009, J. Artif. Intell. Res..

[29]  Steven de Rooij,et al.  Catching Up Faster in Bayesian Model Selection and Model Averaging , 2007, NIPS.

[30]  Ran El-Yaniv,et al.  On Prediction Using Variable Order Markov Models , 2004, J. Artif. Intell. Res..