Derivation of the Least Squares Estimator for Beta in Matrix Notation | Economic Theory Blog
Excerpt
The following post is going to derive the least squares estimator for , which we will denote as . In general start by mathematically formalizing relationships we think are preâŠ
The following post is going to derive the least squares estimator for , which we will denote as
. In general start by mathematically formalizing relationships we think are present in the real world and write it down in a formula.
(1)
Formula (1) depicts such a model, where represents the true relationship between variables in our population. However, it is rare that we observe the whole population and with it the true relationship
. Most times we observe just a small fraction of what is really going on in the world. Nevertheless, even if you just observe a faction, it is our job to estimate the true value
as good as possible. One way to estimate the value of
is done by using Ordinary Least Squares Estimator (OLS). In the following we we are going to derive an estimator for
. The estimated values for
will be called
.
Assume we collected some data and have a dataset which represents a sample of the real world. Let the following equation (2) represent the mathematical model of relationships we presume to exist in the real world and consequently in our sample.
(2)
Equation (3) is supposed to present equation (2) in a more intuitively accessible way for those of you who still need some routine in reading matrix notation, however it is really just the same as equation (2).
(3)
The idea of the ordinary least squares estimator (OLS) consists in choosing in such a way that, the sum of squared residual (i.e.
) in the sample is as small as possible. Mathematically this means that in order to estimate the
we have to minimize
which in matrix notation is nothing else than
.
(4)
In order to estimate we need to minimize
. This is what we are going to do. Per definition we know that
which follows directly from formula (2). Consequently we can write
as
by simply plugging in the expression
into
. This leaves us with the following minimization problem:
(5)
(6)
(7)
(8)
It is important to understand that . As both terms are are scalars, meaning of dimension 1Ă1, the transposition of the term is the same term.
In order to minimize the expression in (8), we have to differentiate the expression in (8) with respect to and set the derivative equal zero. In order to be able to do that we make use of the following mathematical statements:
(proof)
Using the two statements allows us to minimize expression (8).
(8)
(9)
(10)
Finally to solve expression (9) for it is necessary to pre-multiply expression (10) with
. This gives us the least squares estimator for
.
(11)
One last mathematical thing, the second order condition for a minimum requires that the matrix is positive definite. This requirement is fulfilled in case
has full rank.
Congratulation you just derived the least squares estimator .