Discriminative models, not discriminative training

Suppose you are given a dataset of pairs (x, c) where c is a class variable and x is a vector of features. Given a new x, you want to predict its class. The generative i.i.d. approach to this problem posits a model family p(x, c | θ) = p(x | c, λ)p(c | π) (1) and chooses the best parameters θ = {λ, π} by maximizing (or integrating over) the joint distribution (where D denotes the data): p(D, θ) = p(θ) ∏