Derivation of the Least Squares Estimator for Beta in Matrix Notation – Proof Nr. 1 | Economic Theory Blog

Excerpt

In the post that derives the least squares estimator, we make use of the following statement: latex \frac =2X’Xb&s=2 This post shows how one can…


In the post that derives the least squares estimator, we make use of the following statement:

\frac{\partial b'X'Xb}{\partial b} =2X'Xb

This post shows how one can prove this statement. Let’s start from the statement that we want to prove:

\frac{\partial \hat{\beta}'X'X\hat{\beta}}{\partial \hat{\beta}}=2 X'X \hat{\beta}'

Note that X'X is symmetric. Hence, in order to simplify the math we are going to label X'X as A, i.e. X'X :=A.

\hat{\beta}'A\hat{\beta}= \begin{bmatrix} \hat{\beta}{1} & \hat{\beta}{2} & \hdots & \hat{\beta}{k}\end{bmatrix} \begin{bmatrix} a{11} & a_{12} & \hdots & a_{1k}\ a_{21} & a_{22} & \hdots & a_{2k}\ \vdots & \vdots & \ddots & \vdots \ a_{k1} & a_{k2} & \hdots & a_{kk} \end{bmatrix} \begin{bmatrix} \hat{\beta_{1}}\ \hat{\beta_{2}} \ \vdots \ \hat{\beta_{k}} \end{bmatrix}

\hat{\beta}'A\hat{\beta}= \begin{bmatrix} \sum\limits_{i=1}^k \hat{\beta_{i}}a_{i1} & \sum\limits_{i=1}^k \hat{\beta_{i}}a_{i2} & \hdots & \sum\limits_{i=1}^k \hat{\beta_{i}}a_{ik}\end{bmatrix} \begin{bmatrix} \hat{\beta_{1}}\ \hat{\beta_{2}} \ \vdots \ \hat{\beta_{k}} \end{bmatrix}

\hat{\beta}'A\hat{\beta}= \begin{matrix} \hat{\beta}^{2}{1}a{11}+\hat{\beta}{1}\hat{\beta}{2}a_{21}+\hdots+\hat{\beta}{1}\hat{\beta}{k}a_{k1}+\ \hat{\beta}{2}\hat{\beta}{1}a_{21}+\hat{\beta}{2}^{2}a{22}+\hdots+\hat{\beta}{2}\hat{\beta}{k}a_{k2}+\ \vdots \ \hat{\beta}{k}\hat{\beta}{1}a_{k1}+\hat{\beta}{k}\hat{\beta}{2}a_{k2}+\hdots+\hat{\beta}{k}^{2}a{kk}\ \end{matrix}

Let’s compute the partial derivative of \hat{\beta}'A\hat{\beta} with respect to \hat{\beta}.

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}{1}}=2\hat{\beta}{1}a_{11}+\hat{\beta}{2}a{21}+\hdots+\hat{\beta}{k}a{k1}+\hat{\beta}{2}a{12}+\hdots+\hat{\beta}{2}a{2k}++\hdots+\hat{\beta}{k}a{1k}+\hdots+\hat{\beta}{k}a{kk}

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}{1}}= 2(\hat{\beta}{1}a_{11}+\hat{\beta}{2}a{12}+\hdots+\hat{\beta}{k}a{1k})

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}{2}}= 2(\hat{\beta}{1}a_{21}+\hat{\beta}{2}a{22}+\hdots+\hat{\beta}{k}a{2k})

\vdots

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}{k}}= 2(\hat{\beta}{1}a_{k1}+\hat{\beta}{2}a{k2}+\hdots+\hat{\beta}{k}a{kk})

Instead of stating every single equation, one can state the same using the more compact matrix notation:

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{1}}=2A\hat{\beta}

plugging in X'X for A

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{1}}=2X'X\hat{\beta}

Now let’s return to the derivation of the least squares estimator.

Post navigation

“In God we trust; all others must bring data.” W. Edwards Deming