# ISI MStat PSB 2018 Problem 9 | Regression Analysis

This is a very simple sample problem from ISI MStat PSB 2018 Problem 9. It is mainly based on estimation of ordinary least square estimates and Likelihood estimates of regression parameters. Try it!

## Problem - ISI MStat PSB 2018 Problem 9

Suppose $$(y_i,x_i)$$ satisfies the regression model,

$$y_i= \alpha + \beta x_i + \epsilon_i$$ for $$i=1,2,....,n.$$

where $${ x_i : 1 \le i \le n }$$ are fixed constants and $${ \epsilon_i : 1 \le i \le n}$$ are i.i.d. $$N(0, \sigma^2)$$ errors, where $$\alpha, \beta$$ and $$\sigma^2 (>0)$$ are unknown parameters.

(a) Let $$\tilde{\alpha}$$ denote the least squares estimate of $$\alpha$$ obtained assuming $$\beta=5$$. Find the mean squared error (MSE) of $$\tilde{\alpha}$$ in terms of model parameters.

(b) Obtain the maximum likelihood estimator of this MSE.

### Prerequisites

Normal Distribution

Ordinary Least Square Estimates

Maximum Likelihood Estimates

## Solution :

These problem is simple enough,

for the given model, $$y_i= \alpha + \beta x_i + \epsilon_i$$ for $$i=1,....,n$$.

The scenario is even simpler here since, it is given that $$\beta=5$$ , so our model reduces to,

$$y_i= \alpha + 5x_i + \epsilon_i$$, where $$\epsilon_i \sim N(0, \sigma^2)$$ and $$\epsilon_i$$'s are i.i.d.

now we know that the Ordinary Least Square (OLS) estimate of $$\alpha$$ is

$$\tilde{\alpha} = \bar{y} - \tilde{\beta}\bar{x}$$ (How ??) where $$\tilde{\beta}$$ is the (generally) the OLS estimate of $$\beta$$, but here $$\beta=5$$ is known, so,

$$\tilde{\alpha}= \bar{y} - 5\bar{x}$$ again,

$$E(\tilde{\alpha})=E( \bar{y}-5\bar{x})=alpha-(\beta-5)\bar{x}$$, hence $$\tilde{\alpha}$$ is a biased estimator for $$\alpha$$ with $$Bias_{\alpha}(\tilde{\alpha})= (\beta-5)\bar{x}$$.

So, the Mean Squared Error, MSE of $$\tilde{\alpha}$$ is,

$$MSE_{\alpha}(\tilde{\alpha})= E(\tilde{\alpha} - \alpha)^2=Var(\tilde{\alpha})$$ + $${Bias^2}_{\alpha}(\tilde{\alpha})$$

$$= frac{\sigma^2}{n}+ \bar{x}^2(\beta-5)^2$$

[ as, it follows clearly from the model, $$y_i \sim N( \alpha +\beta x_i , \sigma^2)$$ and $$x_i$$'s are non-stochastic ] .

(b) the last part follows directly from the, the note I provided at the end of part (a),

that is, $$y_i \sim N( \alpha + \beta x_i , \sigma^2 )$$ and we have to find the Maximum Likelihood Estimator of $$\sigma^2$$ and $$\beta$$ and then use the inavriant property of MLE. ( in the MSE obtained in (a)). In leave it as an Exercise !! Finish it Yourself !

## Food For Thought

Suppose you don't know the value of $$\beta$$ even, What will be the MSE of $$\tilde{\alpha}$$ in that case ?

Also, find the OLS estimate of $$\beta$$ and you already have done it for $$\alpha$$, so now find the MLEs of all $$\alpha$$ and $$\beta$$. Are the OLS estimates are identical to the MLEs you obtained ? Which assumption induces this coincidence ?? What do you think !!

# ISI MStat PSB 2013 Problem 4 | Linear Regression

This is a sample problem from ISI MStat PSB 2013 Problem 4. It is based on the simple linear regression model, finding the estimates, and MSEs. But think over the "Food for Thought" any kind of discussion will be appreciated. Give it a try!

## Problem- ISI MStat PSB 2013 Problem 4

Consider n independent observation $${ (x_i,y_i) :1 \le i \le n}$$ from the model

$$Y= \alpha + \beta x + \epsilon$$ ,

where $$\epsilon$$ is normal with mean 0 and variance $$\sigma^2$$ . Let $$\hat{\alpha}, \hat{\beta}$$ and $$\hat{\sigma}^2$$ be the maximum likelihood estimators of $$\alpha , \beta$$ and $$\sigma^2$$ , respectively. Let $$v_{11}, v_{22}$$ and $$v_{12}$$ be the estimated values of $$Var(\hat{\alpha}), Var(\hat{\beta}$$ and $$Cov ( \hat{\alpha}, \hat{\beta})$$, respectively.

(a) What is the estimated mean of Y, when when $$x=x_o$$ ? Estimate the mean squared error of this estimator .

(b) What is the predicted value of Y, when when $$x=x_o$$ ? Estimate the mean squared error of this predictor .

### Prerequisites

Linear Regression

Method of Least Squares

Maximum likelihood Estimators.

Mean Squared Error.

## Solution :

Here for the given model,

we have , the random errors, $$\epsilon \sim n(0, \sigma^2)$$, and the maximum likelihood estimators (MLE), of the model parameters are given by $$\hat{\alpha}, \hat{\beta}$$ and $$\hat{\sigma}^2$$. The interesting thing about this model is, since the random errors $$\epsilon$$ are Gaussian Random variables, the Ordinary Least Square Estimates of the model parameters $$\alpha, \beta$$ and $$\sigma^2$$, are identical to their Maximum Likelihood Estimators, ( which are already given!). How ?? Verify it yourself and once and remember it henceforth.

So, here $$\hat{\alpha}, \hat{\beta}$$ and $$\hat{\sigma}^2$$ there also the OLS estimates of the model parameters respectively.

And By Gauss-Markov Theorem, the OLS estimates of the model parameters are the BLUE (Best Linear Unbiased Estimator), for the model parameters. So, here $$\hat{\alpha}, \hat{\beta}$$ and $$\hat{\sigma}^2$$ are also the unbiased estimators of $$\alpha, \beta$$ and $$\sigma^2$$ respectively.

(a) Now we need to find the estimated mean Y given $$x=x_o$$ ,

$$\hat{ E( Y| x=x_o)}= \hat{\alpha} + \hat{\beta} x_o$$ is the estimated mean of Y given $$x=x_o$$.

Now since, the given MLEs ( OLSEs) are also unbiased for their respective parameters,

$$MSE( \hat{ E( Y| x=x_o)})=MSE(\hat{\alpha} + \hat{\beta} x_o)=E(\hat{\alpha} + \hat{\beta} x_o-(\alpha + \beta x_o))^2$$

=$$E(\hat{\alpha} + \hat{\beta} x_o-E(\hat{\alpha} + \hat{\beta} x_o))^2$$

=$$Var( \hat{\alpha} + \hat{\beta} x_o)$$

= $$Var(\hat{\alpha}+2x_o Cov(\hat{\alpha}, \hat{\beta})+ {x_o}^2Var(\hat{\beta})$$

So, $$.MSE( \hat{ E( Y| x=x_o)})= v_{11} +2x_o v_{12} + {x_o}^2 {v_{22}}$$.

(b) Similarly, when $$x=x_o$$ , the predicted value of Y would be,

$$\hat{Y} = \hat{\alpha} + \hat{\beta} x_o +\epsilon$$ is the predicted value of Y when $$x=x_o$$ is given.

Using similar arguments, as in (a) Properties of independence between the model parameters , verify that,

$$MSE(\hat{Y})= v_{11}+ 2x_o v_{12} + {x_o}^2{ v_{22}}+{\hat{\sigma}^2}$$. Hence we are done !

## Food For Thought

Now, can you explain Why, the Maximum Likelihood Estimators and Ordinary Least Square Estimates are identical, when the model assumes Gaussian errors ??

Wait!! Not done yet. The main course is served below !!

In a game of dart, a thrower throws a dart randomly and uniformly in a unit circle. Let $$\theta$$ be the angle between the line segment joining the dart and the center and the horizontal axis, now consider Z be a random variable. When the thrower is lefty , Z=-1 and when the thrower is right-handed , Z=1 . Assume that getting a Left-handed and Right-handed thrower is equally likely ( is it really equally likely, in real scenario ?? ). Can you construct a regression model, for regressing $$\theta$$ on Z.

Think over it, if you want to discuss, we can do that too !!

# Invariant Regression Coefficient | ISI MStat 2019 PSB Problem 8

This is a problem from ISI MStat Examination, 2019. This tests one's familiarity with the simple and multiple linear regression model and estimation of model parameters and is based on the Invariant Regression Coefficient.

## The Problem- Invariant Regression Coefficient

Suppose $$\{ (x_i,y_i,z_i):i=1,2,…,n \}$$ is a set of trivariate observations on three variables:$$X,Y,Z$$,, where $$z_i=0$$ for $$i=1,2,…,n-1$$ and $$z_n=1$$.Suppose the least squares linear regression equation of $$Y$$ on $$X$$ based on the first $$n-1$$ observations is $$y=\hat{\alpha_0}+\hat{\alpha_1}x$$ and the least squares linear regression equation of $$Y$$ on $$X$$ and $$Z$$ based on all the $$n$$ observations is $$y=\hat{\beta_0}+\hat{\beta_1}x+\hat{\beta_2}z$$ . Show that $\hat{\alpha_1}=\hat{\beta_1}$.

## Prerequisites

1.Knowing how to estimate the parameters in a linear regression model (Least Square sense)

2. Brief idea about multiple linear regression.

## Solution

Based on the first $$n-1$$ observations, as $$z_i=0$$, so, we consider a typical linear regression model of $$Y$$ on $$X$$.

Thus,the least square estimate is given by $$\hat{\alpha_1}=\frac{\sum_{i=1}^{n-1} (x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{n-1} (x_i-\bar{x})^2}$$

And in the second case, we have:

$$y_1=\beta_0+\beta_1 x_1+\epsilon_1$$

$$y_2=\beta_0+\beta_1 x_2+ \epsilon_2$$

$$\vdots$$

$$y_n=\beta_{0}+\beta_1 x_n+\beta_2+ \epsilon_n$$

Thus, the error sum of squares for this model is given by:

$$SSE=\sum_{i=1}^{n-1} (y_i-\beta_0-\beta_1 x_i)^2+(y_n-\beta_1 x_n -\beta_0 -\beta_2)^2$$ , as $$z_n=1$$.

By differentiating SSE with respect to $$\beta_2$$, at the optimal value, we must have:

$$\hat{\beta}_2 = y_n -\hat{\beta_1}x_n-\hat{\beta_0}$$

That is, the last term of SSE must vanish to attain optimality.

So, it is again equivalent to minimize

$$\sum_{i=1}^{n-1} (y_i-\beta_0-\beta_1 x_i)^2$$ with respect to $$\beta_{0} ,\beta_{1}$$

This, is nothing but the simple linear regression model again and thus, $$\hat{\beta_1}=\hat{\alpha_1}$$ and furthermore, $$\hat{\beta_0}=\hat{\alpha_0}$$.

## Food For Thought

Suppose you have two sets of independent samples. Let they be $$\{ (y_1,x_1), ...(y_{n_1},x_{n_1}) \}$$ and $$\{ (y_{n_1 +1},x_{n_1 +1} ) ,...,(y_{n_1 + n_2} ,x_{n_1 + n_2} ) \}$$.

Now you want to fit 2 models to these samples:

$$y_i=\beta_0 + \beta_1 x_i + \epsilon_i$$ for $$i=1,2,..,n_1$$

and

$$y_i=\gamma_0 + \gamma_1 x_i + \epsilon_i$$ for $$i=n_1 +1 ,.. ,n_1 + n_2$$

Can you write these two models as a single model?

After that ,considering all assumptions for linear regression to be true (If you are not aware of these assumptions you may browse through any regression book or search the internet), is it justifiable to infer $$\beta_1 = \gamma_1$$ ?