An Information Theoretic Approach to Approximating a Probability Distribution

Let $f( x )$ be a probability density function (with respect to Lebesgue measure) over a compact interval $[ {a,b} ]$, that is, $f( x )\geqq 0$ for $x \in [ {a,b} ]$ and $\int_a^b {f( x )} dx = 1$. Without loss of generality, we may take the interval to be $[ { - 1,1} ]$. Let $\{ {\varphi _i ( x )} \}_{i = 0}^\infty $ be the system of (normalized) Legendre polynomials, which is a complete orthonormal basis for the set $L^2 [ { - 1,1} ]$ of square-integrable functions over $[ { - 1,1} ]$. Define the canonical exponential family of probability density functions of order m by $p_m ( {x|\tau } ) = \exp [ {\sum\nolimits_{i = 1}^m {\tau _i \varphi _i ( x ) - \psi _m ( \tau )} } ],x \in [ { - 1,1} ]$, where $\tau = ( {\tau _1 ,\tau _2 , \cdots ,\tau _m } )^\prime $ is an arbitrary vector in $R^m $ and the normalizing function $\psi _m ( \tau )$ is determined by the condition that $\smallint _{ - 1}^1 p_m ( {x|\tau } )dx = 1$, so that $\exp [ {\psi _m ( \tau )} ] = \int_{ - 1}^1 {\exp [ {\sum\nolimits_{i = 1}^m {...