We briefly prove the two major theorems used in this work
Given a set of $N$ samples of a signal $X = \{x_0, ..., x_N\}$, let $Y = \{y_0,...,y_n\}$ be the DCT coefficients of $X$. Then for any $1 \leq m \leq N$, the approximation
$$ p_m(t) = \frac{1}{\sqrt{n}}y_0 + \sqrt{\frac{2}{n}}\sum_{k=1}^m y_k \cos\left(\frac{k(2t+1)\pi}{2n}\right) $$of $X$ minimizes the least squared error
$$ e_m = \sum_{i=0}^n (p_m(i) - x_i)^2 $
Proof. First consider that since Equation 1 represents the Discrete Cosine Transform, which is a Linear map, we can write rewrite it as
$$ D_m^Ty = x $$where D_m is formed from the first $m$ rows of the DCT matrix, $y$ is a row vector of the DCT coefficients, and $x$ is a row vector of the original samples.
To solve for the least squares solution, we use the normal equations, that is we solve
$$ D_mD_m^Ty = D_mx $$and since the DCT is an orthogonal transform, the rows of $D_m$ are orthogonal, so $D_mD_m^T = I$. Therefore
$$ y = D_mx $$Since there is no contradiction, the least squares solution must use the first $m$ DCT coefficients.
Given a set of samples of a signal $X$ such that $\mathrm{E}[X] = 0$, let $Y$ be the DCT coefficients of $X$. Then
$$ \mathrm{Var}[X] = \mathrm{E}[Y^2] $$Proof. Start by considering $\mathrm{Var}[X]$. We can rewrite this as
$$ \mathrm{Var}[X] = \mathrm{E}[X^2] - \mathrm{E}[X]^2 $$Since we are given $\mathrm{E}[X] = 0$, this simplifies to
$$ \mathrm{Var}[X] = \mathrm{E}[X^2] $$Next we express the DCT as a linear map such that $X = DY$ and rewrite the previous equation as
$$ \mathrm{Var}[X] = \mathrm{E}[(DY)^2] $$Distributing the squaring operation gives
$$ \mathrm{E}[(DY)^2] = \mathrm{E}[D^TDY^2] $$Since $D$ is orthogonal, this simplifies to
$$ \mathrm{E}[D^TDY^2] = \mathrm{E}[D^{-1}DY^2] = \mathrm{E}[Y^2] $$Copyright © 2019 Max Ehrlich