Get inspired by the success stories of our students in IIT JAM MS, ISI  MStat, CMI MSc Data Science.  Learn More 

Restricted Regression Problem | ISI MStat 2017 PSB Problem 7

This problem is a regression problem, where we use the ordinary least square methods, to estimate the parameters in a restricted case scenario. This is ISI MStat 2017 PSB Problem 7.

Problem

Consider independent observations \({\left(y_{i}, x_{1 i}, x_{2 i}\right): 1 \leq i \leq n}\) from the regression model
$$
y_{i}=\beta_{1} x_{1 i}+\beta_{2} x_{2 i}+\epsilon_{i}, i=1, \ldots, n
$$ where \(x_{1 i}\) and \(x_{2 i}\) are scalar covariates, \(\beta_{1}\) and \(\beta_{2}\) are unknown scalar
coefficients, and \(\epsilon_{i}\) are uncorrelated errors with mean 0 and variance \(\sigma^{2}>0\). Instead of using the correct model, we obtain an estimate \(\hat{\beta_{1}}\) of \(\beta_{1}\) by minimizing
$$
\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}
$$ Find the bias and mean squared error of \(\hat{\beta}_{1}\).

Prerequisites

Solution

It is sort of a restricted regression problem because maybe we have tested the fact that \(\beta_2 = 0\). Hence, we are interested in the estimate of \(\beta_1\) given \(\beta_2 = 0\). This is essentially the statistical significance of this problem, and we will see how it turns out in the estimate of \(\beta_1\).

Let's start with some notational nomenclature.
\( \sum_{i=1}^{n} a_{i} b_{i} = s_{a,b} \)

Let's minimize \( L(\beta_1) = \sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}\) by differentiating w.r.t \(\beta_1\) and equating to 0.

\( \frac{dL(\beta_1)}{d\beta_1}\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2} = 0\)

\( \Rightarrow \sum_{i=1}^{n} x_{1 i} \left(y_{i}-\beta_{1} x_{1 i}\right) = 0 \)

\( \Rightarrow \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}} \)

From, the given conditions, \( E(Y_{i})=\beta_{1} X_{1 i}+\beta_{2} X_{2 i}\).

\( \Rightarrow E(s_{X_{1},Y}) = \beta_{1}s_{X_{1},X_{1}} +\beta_{2} s_{X_{1},X_{2}} \).

Since, \(x's\) are constant, \( E(\hat{\beta_1}) = \beta_{1} +\beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} \).

\( Bias(\hat{\beta_1}) = \beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} \).

Thus, observe that the more \( \beta_2 \) is close to 0, the more bias is close to 0.

From, the given conditions,

\( Y_{i} - \beta_{1} X_{1 i} - \beta_{2} X_{2 i}\) ~ Something\(( 0 , \sigma^2\)).

\( \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}}\) ~ Something\(( E(\hat{\beta_{1}}) , Var(\hat{\beta_1}))\).

\(Var(\hat{\beta_1}) = \frac{\sum_{i=1}^{n} x_{1i}^2 Var(Y_{i})}{s_{X_1, X_1}^2} = \frac{\sigma^2}{s_{X_1, X_1}} \)

\( MSE(\hat{\beta_1}) = Variance + \text{Bias}^2 = \frac{\sigma^2}{s_{X_1, X_1}} + \beta_{2}^2(\frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}})^2\)

Observe, that even the MSE is minimized if \(\beta_2 = 0\).

This problem is a regression problem, where we use the ordinary least square methods, to estimate the parameters in a restricted case scenario. This is ISI MStat 2017 PSB Problem 7.

Problem

Consider independent observations \({\left(y_{i}, x_{1 i}, x_{2 i}\right): 1 \leq i \leq n}\) from the regression model
$$
y_{i}=\beta_{1} x_{1 i}+\beta_{2} x_{2 i}+\epsilon_{i}, i=1, \ldots, n
$$ where \(x_{1 i}\) and \(x_{2 i}\) are scalar covariates, \(\beta_{1}\) and \(\beta_{2}\) are unknown scalar
coefficients, and \(\epsilon_{i}\) are uncorrelated errors with mean 0 and variance \(\sigma^{2}>0\). Instead of using the correct model, we obtain an estimate \(\hat{\beta_{1}}\) of \(\beta_{1}\) by minimizing
$$
\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}
$$ Find the bias and mean squared error of \(\hat{\beta}_{1}\).

Prerequisites

Solution

It is sort of a restricted regression problem because maybe we have tested the fact that \(\beta_2 = 0\). Hence, we are interested in the estimate of \(\beta_1\) given \(\beta_2 = 0\). This is essentially the statistical significance of this problem, and we will see how it turns out in the estimate of \(\beta_1\).

Let's start with some notational nomenclature.
\( \sum_{i=1}^{n} a_{i} b_{i} = s_{a,b} \)

Let's minimize \( L(\beta_1) = \sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}\) by differentiating w.r.t \(\beta_1\) and equating to 0.

\( \frac{dL(\beta_1)}{d\beta_1}\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2} = 0\)

\( \Rightarrow \sum_{i=1}^{n} x_{1 i} \left(y_{i}-\beta_{1} x_{1 i}\right) = 0 \)

\( \Rightarrow \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}} \)

From, the given conditions, \( E(Y_{i})=\beta_{1} X_{1 i}+\beta_{2} X_{2 i}\).

\( \Rightarrow E(s_{X_{1},Y}) = \beta_{1}s_{X_{1},X_{1}} +\beta_{2} s_{X_{1},X_{2}} \).

Since, \(x's\) are constant, \( E(\hat{\beta_1}) = \beta_{1} +\beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} \).

\( Bias(\hat{\beta_1}) = \beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} \).

Thus, observe that the more \( \beta_2 \) is close to 0, the more bias is close to 0.

From, the given conditions,

\( Y_{i} - \beta_{1} X_{1 i} - \beta_{2} X_{2 i}\) ~ Something\(( 0 , \sigma^2\)).

\( \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}}\) ~ Something\(( E(\hat{\beta_{1}}) , Var(\hat{\beta_1}))\).

\(Var(\hat{\beta_1}) = \frac{\sum_{i=1}^{n} x_{1i}^2 Var(Y_{i})}{s_{X_1, X_1}^2} = \frac{\sigma^2}{s_{X_1, X_1}} \)

\( MSE(\hat{\beta_1}) = Variance + \text{Bias}^2 = \frac{\sigma^2}{s_{X_1, X_1}} + \beta_{2}^2(\frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}})^2\)

Observe, that even the MSE is minimized if \(\beta_2 = 0\).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

6 comments on “Restricted Regression Problem | ISI MStat 2017 PSB Problem 7”

Knowledge Partner

Cheenta is a knowledge partner of Aditya Birla Education Academy
Cheenta

Cheenta Academy

Aditya Birla Education Academy

Aditya Birla Education Academy

Cheenta. Passion for Mathematics

Advanced Mathematical Science. Taught by olympians, researchers and true masters of the subject.
JOIN TRIAL
support@cheenta.com
Menu
Trial
Whatsapp
rockethighlight