Harnessing Unlabeled Examples through Iterative Application of Dynamic Markov Modeling

We describe the application of dynamic Markov modeling - a sequential bit-wise prediction technique - to labeling email cor- pora for the 2006 ECML/PKDD Discovery Challenge. Our technique involves: (1) converting the corpora's bag-of-words representation to a sequence of bits; (2) using logistic regression on the training data to induce an initial maximum likelihood classifier; (2) combining all test sets into one; (3) ordering the combined set by decreasing magnitude of the log-likelihood ratio; (4) iteratively applying dynamic Markov model- ing (DMC) to compute successive log-likelihood estimates; (5) averaging successive estimates to form an overall estimate; (6) partitioning the combined estimates into separate results for each test set. Post-hoc ex- periments showed that: (a) the iterative process improved on the initial classifier in almost all cases; (b) treating each test set separately yielded nearly indistinguishable results.