Get inspired by the success stories of our students in IIT JAM MS, ISI MStat, CMI MSc Data Science. Learn More

This problem is a regression problem, where we use the ordinary least square methods, to estimate the parameters in a restricted case scenario. This is ISI MStat 2017 PSB Problem 7.

Consider independent observations \({\left(y_{i}, x_{1 i}, x_{2 i}\right): 1 \leq i \leq n}\) from the regression model

$$

y_{i}=\beta_{1} x_{1 i}+\beta_{2} x_{2 i}+\epsilon_{i}, i=1, \ldots, n

$$ where \(x_{1 i}\) and \(x_{2 i}\) are scalar covariates, \(\beta_{1}\) and \(\beta_{2}\) are unknown scalar

coefficients, and \(\epsilon_{i}\) are uncorrelated errors with mean 0 and variance \(\sigma^{2}>0\). Instead of using the correct model, we obtain an estimate \(\hat{\beta_{1}}\) of \(\beta_{1}\) by minimizing

$$

\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}

$$ Find the bias and mean squared error of \(\hat{\beta}_{1}\).

- Ordinary Least Square Method
- Minimizing the Square Loss Error Function
- Multiple Regression
- Mean Square Error = \(\text{Variance} + \text{Bias}^2\).
- Bias

It is sort of a restricted regression problem because maybe we have tested the fact that \(\beta_2 = 0\). Hence, we are interested in the estimate of \(\beta_1\) given \(\beta_2 = 0\). This is essentially the statistical significance of this problem, and we will see how it turns out in the estimate of \(\beta_1\).

Let's start with some notational nomenclature.

\( \sum_{i=1}^{n} a_{i} b_{i} = s_{a,b} \)

Let's minimize \( L(\beta_1) = \sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}\) by differentiating w.r.t \(\beta_1\) and equating to 0.

\( \frac{dL(\beta_1)}{d\beta_1}\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2} = 0\)

\( \Rightarrow \sum_{i=1}^{n} x_{1 i} \left(y_{i}-\beta_{1} x_{1 i}\right) = 0 \)

\( \Rightarrow \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}} \)

From, the given conditions, \( E(Y_{i})=\beta_{1} X_{1 i}+\beta_{2} X_{2 i}\).

\( \Rightarrow E(s_{X_{1},Y}) = \beta_{1}s_{X_{1},X_{1}} +\beta_{2} s_{X_{1},X_{2}} \).

Since, \(x's\) are constant, \( E(\hat{\beta_1}) = \beta_{1} +\beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} \).

\( Bias(\hat{\beta_1}) = \beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} \).

Thus, observe that the more \( \beta_2 \) is close to 0, the more bias is close to 0.

From, the given conditions,

\( Y_{i} - \beta_{1} X_{1 i} - \beta_{2} X_{2 i}\) ~ Something\(( 0 , \sigma^2\)).

\( \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}}\) ~ Something\(( E(\hat{\beta_{1}}) , Var(\hat{\beta_1}))\).

\(Var(\hat{\beta_1}) = \frac{\sum_{i=1}^{n} x_{1i}^2 Var(Y_{i})}{s_{X_1, X_1}^2} = \frac{\sigma^2}{s_{X_1, X_1}} \)

\( MSE(\hat{\beta_1}) = Variance + \text{Bias}^2 = \frac{\sigma^2}{s_{X_1, X_1}} + \beta_{2}^2(\frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}})^2\)

Observe, that even the MSE is minimized if \(\beta_2 = 0\).

This problem is a regression problem, where we use the ordinary least square methods, to estimate the parameters in a restricted case scenario. This is ISI MStat 2017 PSB Problem 7.

Consider independent observations \({\left(y_{i}, x_{1 i}, x_{2 i}\right): 1 \leq i \leq n}\) from the regression model

$$

y_{i}=\beta_{1} x_{1 i}+\beta_{2} x_{2 i}+\epsilon_{i}, i=1, \ldots, n

$$ where \(x_{1 i}\) and \(x_{2 i}\) are scalar covariates, \(\beta_{1}\) and \(\beta_{2}\) are unknown scalar

coefficients, and \(\epsilon_{i}\) are uncorrelated errors with mean 0 and variance \(\sigma^{2}>0\). Instead of using the correct model, we obtain an estimate \(\hat{\beta_{1}}\) of \(\beta_{1}\) by minimizing

$$

\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}

$$ Find the bias and mean squared error of \(\hat{\beta}_{1}\).

- Ordinary Least Square Method
- Minimizing the Square Loss Error Function
- Multiple Regression
- Mean Square Error = \(\text{Variance} + \text{Bias}^2\).
- Bias

It is sort of a restricted regression problem because maybe we have tested the fact that \(\beta_2 = 0\). Hence, we are interested in the estimate of \(\beta_1\) given \(\beta_2 = 0\). This is essentially the statistical significance of this problem, and we will see how it turns out in the estimate of \(\beta_1\).

Let's start with some notational nomenclature.

\( \sum_{i=1}^{n} a_{i} b_{i} = s_{a,b} \)

Let's minimize \( L(\beta_1) = \sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}\) by differentiating w.r.t \(\beta_1\) and equating to 0.

\( \frac{dL(\beta_1)}{d\beta_1}\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2} = 0\)

\( \Rightarrow \sum_{i=1}^{n} x_{1 i} \left(y_{i}-\beta_{1} x_{1 i}\right) = 0 \)

\( \Rightarrow \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}} \)

From, the given conditions, \( E(Y_{i})=\beta_{1} X_{1 i}+\beta_{2} X_{2 i}\).

\( \Rightarrow E(s_{X_{1},Y}) = \beta_{1}s_{X_{1},X_{1}} +\beta_{2} s_{X_{1},X_{2}} \).

Since, \(x's\) are constant, \( E(\hat{\beta_1}) = \beta_{1} +\beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} \).

\( Bias(\hat{\beta_1}) = \beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} \).

Thus, observe that the more \( \beta_2 \) is close to 0, the more bias is close to 0.

From, the given conditions,

\( Y_{i} - \beta_{1} X_{1 i} - \beta_{2} X_{2 i}\) ~ Something\(( 0 , \sigma^2\)).

\( \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}}\) ~ Something\(( E(\hat{\beta_{1}}) , Var(\hat{\beta_1}))\).

\(Var(\hat{\beta_1}) = \frac{\sum_{i=1}^{n} x_{1i}^2 Var(Y_{i})}{s_{X_1, X_1}^2} = \frac{\sigma^2}{s_{X_1, X_1}} \)

\( MSE(\hat{\beta_1}) = Variance + \text{Bias}^2 = \frac{\sigma^2}{s_{X_1, X_1}} + \beta_{2}^2(\frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}})^2\)

Observe, that even the MSE is minimized if \(\beta_2 = 0\).

Cheenta is a knowledge partner of Aditya Birla Education Academy

Advanced Mathematical Science. Taught by olympians, researchers and true masters of the subject.

JOIN TRIALAcademic Programs

Free Resources

Why Cheenta?

I guess the estimate of Beta 1 here is slightly incorrect .

beta 1 estimate is wrong dude !!!

beta 1 estimate is wrong dude.

Mistake in taking derivative..Whole solution gone wrong..Yare yare daze

Mistake in taking derivative. Whole solution gone wrong..

Yes srijit da!!

Estimate value of β1 is not true