This is a problem involving BLUE for regression coefficients and MLE of a regression coefficient for a particular case of the regressors. This problem is from ISI MStat PSB 2015 Question 8.

## The Problem- ISI MStat PSB 2015 Question 8

Consider the regression model:

\( y_i = bx_i + e_i \), \(1 \le i \le n \)

where \( x_i ‘s \) are fixed non-zero real numbers and \( e_i \) ‘s are independent random variables with mean 0 and equal variance.

(a) Consider estimators of the form \( \sum_{i=1}^{n} a_i y_i \) (where \(a_i \)’s are non random real numbers) that are unbiased for \(b \). Show that the least squares estimator of \(b\) has the minimum variance in this class of estimators.

(b) Suppose that \( x_i \) ‘s take values \( -1 \) or \(+1 \) and \( e_i \)’s have density \( f(t)=\frac{1}{2} e^{-|t|} , t \in \mathbb{R} \).

Find the maximum likelihood estimator of \( b \).

## Pre-requisites:

1.Linear estimation

2.Minimum Variance Unbiased Estimation

3.Principle of Least Squares

4.Finding MLE

## Solution:

Clearly, part(a) is a well known result that the least squares estimator is the BLUE(Best linear unbiased estimator) for the regression coefficients.

You can probably look up its proof in the internet or in any standard text on linear regression.

Part(b) is worth caring about.

Here \( x_i \)’s take values \(+1 ,-1\). But the approach still remains the same.

Let’s look at the likelihood function of \(b\) :

\(L(b) = L(b,y_i,x_i)=\frac{1}{2^n} e^{-\sum_{i=1}^{n} |y_i-bx_i|} \)

or, \( \ln{L} = c- \sum_{i=1}^{n} |y_i -bx_i| \) where \(c \) is an appropriate constant (unimportant here)

Maximizing \( \ln{L} \) w.r.t \(b \) is same as minimizing \( \sum_{i=1}^{n} |y_i – bx_i| \) w.r.t . \(b\).

Note that \( |x_i|=1 \). Let us define \( t_i =\frac{y_i}{x_i} \).

Here’s the catch now: \( \sum_{i=1}^{n} |y_i-bx_i|= \sum_{i=1}^{n} |y-bx_i| . \frac{1}{|x_i|} = \sum_{i=1}^{n} |\frac{y_i}{x_i}-b| =\sum_{i=1}^{n} |t_i – b| \).

Now remember your days when you took your first baby steps in statistics , can you remember the result that “Mean deviation about median is the least” ?

So, \( \sum_{i=1}^{n} |t_i – b| \) is minimized for \( b= \) Median\( (t_i) \) .

Thus, MLE of \(b \) is the median of \( \{ \frac{y_1}{x_1},\frac{y_2}{x_2},…,\frac{y_n}{x_n} \} \).

## Food For Thought:

In classical regression models we assume \(X_i\) ‘s are non-stochastic. But is it really valid always? Not at all.

In case of stochastic \(X_i \)’s , there is a separate branch of regression called **Stochastic Regression**, which deals with a slightly different analysis and estimates.

I urge the interested readers to go through this topic from any book/ paper .

You may refer Montgomery, Draper & Smith etc.

Google