ISI MStat Entrance 2021 Problems and Solutions PSA & PSB

This post contains ISI MStat Entrance PSA and PSB 2021 Problems and Solutions that can be very helpful and resourceful for your ISI MStat Preparation.

Download Paper
PSA Paper
PSB Paper
ISI MStat 2021 PSA Answer Key and Solutions

Click on the links to learn about the detailed solution. (Coming Soon)

  1. 49 (Rolle's Theorem)

2. 2 (4 - number of linear constraints)

3. k = 2 (a = -d, and form a biquadratic which has two real solutions)

4. 0 (divide by $x^4$, use $\frac{sinx}{x}$ limit result)

5. $\frac{p}{q}$ must be a rational number. (The product must be a rational number.)

6. $\alpha = 1, \beta =1$ (Use sandwich theorem on an easy inequality on ceiling of x)

7. $\frac{2n}{n+1}$ (Use geometry and definite integration)

8. $2+ \sqrt{5}$ (Just write down the pythagoras theorem in terms of the variables and solve)

9. 10 (Use the roots of unity)

10. $\frac{3}{8}$ (Find out the cases when it is non zero, and use classical probability)

11. $\frac{(n+1)^n}{n!}$ (Use ${{n} \choose {r}}={{n-1} \choose {r-1}}+{{n-1} \choose r}$)

12. $P(\pi)$ is even for all $\pi$. (Observe that there is one more odd than number of evens, so there will be one odd-odd match)

13. is equal to 12. (The $i,j$th element is $a_{ii}b{ij}c{jj}$. Use gp series then.)

14. 160 (Use the fact any permutation can be written as compositions of transpositions. Observe that the given condition is equivalent to that 2 transpositions are not possible)

15. $m_t < \infty$ for all $t \geq 0$ (All monotone functions are bounded on [a,b])

16.$H(x) = \frac{1-F(-x)+ F(x)}{2}$ (If $F(x)$ is right continuous, $F(-x)$ is left continuous.).

17. $\frac{1}{25}$ (Use the distribution function of $\frac{X}{Y}$)

18. 3 (Find the distribution of order statistic, and find the expectation)

19. (II) but not (I) (If $F(x)$ is right continuous, $F(-x)$ is left continuous.).

20. $20\lambda^4$ (Use gamma integral to find the $E(X_{1}^4)$.)

21. The two new observations are 15 and 5. (Use the condition to find two linear equations to find the observations).

22. It is less than 2. (Use the beta coefficients in terms of sample covariance and sample variance, and compare)

23. 4:3 (Use Bayes' Theorem)

24. The two-sample t-test statistic and the ANOVA statistics yield the same power for any non-zero value of $\mu_1 - \mu_2$ and for any $n,m$. (Both the test statistic are one to one function of one another)

25. t³-1 - 2(t-1)

26. $\frac{2 \sum_{i=1}^{n} X_i}{n(n+1)}$ (Use the invariance property of MLE)

27. $Y_1^2 + Y_2^2 + Y_1Y_2$ (Write the bivariate normal distribution in terms of $Y_1, Y_2$ and use Neyman Factorization Theorem.)

28. can be negative (Simson's Paradox)

29. $2z$ (There are three random variables, $N$ = stopping time to get $Y=1$, $Y$ and $X$. Use the conditioning properly. Take your time)

30. $\frac{40}{3}$ (Use the property that Poisson | Poisson in the given problem follows Binomial)


ISI MStat 2021 PSB Solutions
Coming soon.

ISI MStat PSB 2021 Problem 1

Solution

ISI MStat PSB 2021 Problem 2

Solution

ISI MStat PSB 2021 Problem 3

Solution

ISI MStat PSB 2021 Problem 4

Solution

ISI MStat PSB 2021 Problem 5

Solution

ISI MStat PSB 2021 Problem 6

Solution

ISI MStat PSB 2021 Problem 7

Solution

ISI MStat PSB 2021 Problem 8

Solution

ISI MStat PSB 2021 Problem 9

Solution

Please suggest changes in the comment section.

Cheena Statistics Logo
Cheenta Statistics Department
ISI MStat and IIT JAM Training Program

ISI MStat Entrance 2020 Problems and Solutions PSA & PSB

This post contains ISI MStat Entrance PSA and PSB 2020 Problems and Solutions that can be very helpful and resourceful for your ISI MStat Preparation.

ISI MStat Entrance 2020 Problems and Solutions - Subjective Paper


ISI MStat 2020 Problem 1

Let f(x)=x2−2x+2. Let L1 and L2 be the tangents to its graph at x=0 and x=2 respectively. Find the area of the region enclosed by the graph of f and the two lines L1 and L2.

Solution

ISI MStat 2020 Problem 2

Find the number of 3×3 matrices A such that the entries of A belong to the set Z of all integers, and such that the trace of AtA is 6 . (At denotes the transpose of the matrix A).

Solution

ISI MStat 2020 Problem 3

Consider $n$ independent and identically distributed positive random variables $X_{1}, X_{2}, \ldots, X_{n}$. Suppose $S$ is a fixed subect of ${1,2, \ldots, n}$ consisting of $k$ distinct ekements where $1 \leq k<n$.
(a) Compute
$$
\mathrm{E}\left[\frac{\sum_{i \in s} X_{i}}{\sum_{i=1}^{\infty} X_{i}}\right]
$$
(b) Assume that $X_{i}$ is have mean $\mu$ and variance $\sigma^{2}, 0<\sigma^{2}<\infty$. If $j \notin S$, show that the correlation between ( $\left.\sum_{i \in s} X_{i}\right) X_{j}$ and $\sum_{i \in}X_{i} $ lies between $-\frac{1}{\sqrt{k+1}}$ and $\frac{1}{\sqrt{k+1}}$.

Solution

ISI MStat 2020 Problem 4

Let X1,X2,…,Xn be independent and identically distributed random variables. Let Sn=X1+⋯+Xn. For each of the following statements, determine whether they are true or false. Give reasons in each case.

(a) If Sn∼Exp with mean n, then each Xi∼Exp with mean 1 .

(b) If Sn∼Bin(nk,p), then each Xi∼Bin(k,p)

Solution

ISI MStat 2020 Problem 5

Let U1,U2,…,Un be independent and identically distributed random variables each having a uniform distribution on (0,1) . Let X=min{U1,U2,…,Un}, Y=max{U1,U2,…,Un}

Evaluate E[X∣Y=y] and E[Y∣X=x].

Solution

ISI MStat 2020 Problem 6

Suppose individuals are classified into three categories C1,C2 and C3 Let p2,(1−p)2 and 2p(1−p) be the respective population proportions, where p∈(0,1). A random sample of N individuals is selected from the population and the category of each selected individual recorded.

For i=1,2,3, let Xi denote the number of individuals in the sample belonging to category Ci. Define U=X1+X32

(a) Is U sufficient for p? Justify your answer.

(b) Show that the mean squared error of UN is p(1−p)2N

Solution

ISI MStat 2020 Problem 7

Consider the following model:
$$
y_{i}=\beta x_{i}+\varepsilon_{i} x_{i}, \quad i=1,2, \ldots, n
$$
where $y_{i}, i=1,2, \ldots, n$ are observed; $x_{i}, i=1,2, \ldots, n$ are known positive constants and $\beta$ is an unknown parameter. The errors $\varepsilon_{1}, \varepsilon_{2}, \ldots, \varepsilon_{n}$ are independent and identically distributed random variables having the
probability density function
$$
f(u)=\frac{1}{2 \lambda} \exp \left(-\frac{|u|}{\lambda}\right),-\infty<u<\infty
$$
and $\lambda$ is an unknown parameter.
(a) Find the least squares estimator of $\beta$.
(b) Find the maximum likelihood estimator of $\beta$.

Solution

ISI MStat 2020 Problem 8

Assume that $X_{1}, \ldots, X_{n}$ is a random sample from $N(\mu, 1)$, with $\mu \in \mathbb{R}$. We want to test $H_{0}: \underline{\mu}=0$ against $H_{1}: \mu=1$. For a fixed integer $m \in{1, \ldots, n}$, the following statistics are defined:

\begin{aligned}
T_{1} &=\left(X_{1}+\ldots+X_{m}\right) / m \\
T_{2} &=\left(X_{2}+\ldots+X_{m+1}\right) / m \\
\vdots &=\vdots \\
T_{n-m+1} &=\left(X_{n-m+1}+\ldots+X_{n}\right) / m .
\end{aligned}

Fix $\alpha \in(0,1)$. Consider the test

reject $H_{0}$ if max {${T_{i}: 1 \leq i \leq n-m+1}>c_{m, \alpha}$}

Find a choice of $c_{m, \alpha}$ $\mathbb{R}$ in terms of the standard normal distribution
function $\Phi$ that ensures that the size of the test is at most $\alpha$.

Solution

ISI MStat 2020 Problem 9

ISI MStat 2020 - Objective Paper


ISI MStat 2020 PSA Answer Key

Click on the links to learn about the detailed solution.

1. C2. D3. A4. B5. A
6. B7. C8. A9. C10. A
11. C12. D13. C14. B15. B
16. C17. D18. B19. B20. C
21. C22. D23. A24. B25. D
26. B27. D28. D29. B30. C

Please suggest changes in the comment section.

ISI MStat 2020 Probability Problems Discussion [Recorded Class]

Cheena Statistics Logo
Cheenta Statistics Department
ISI MStat and IIT JAM Training Program

ISI MStat PSB 2018 Problem 9 | Regression Analysis

This is a very simple sample problem from ISI MStat PSB 2018 Problem 9. It is mainly based on estimation of ordinary least square estimates and Likelihood estimates of regression parameters. Try it!

Problem - ISI MStat PSB 2018 Problem 9


Suppose \((y_i,x_i)\) satisfies the regression model,

\( y_i= \alpha + \beta x_i + \epsilon_i \) for \(i=1,2,....,n.\)

where \({ x_i : 1 \le i \le n }\) are fixed constants and \({ \epsilon_i : 1 \le i \le n}\) are i.i.d. \(N(0, \sigma^2)\) errors, where \(\alpha, \beta \) and \(\sigma^2 (>0)\) are unknown parameters.

(a) Let \(\tilde{\alpha}\) denote the least squares estimate of \(\alpha\) obtained assuming \(\beta=5\). Find the mean squared error (MSE) of \(\tilde{\alpha}\) in terms of model parameters.

(b) Obtain the maximum likelihood estimator of this MSE.

Prerequisites


Normal Distribution

Ordinary Least Square Estimates

Maximum Likelihood Estimates

Solution :

These problem is simple enough,

for the given model, \( y_i= \alpha + \beta x_i + \epsilon_i \) for \( i=1,....,n\).

The scenario is even simpler here since, it is given that \(\beta=5\) , so our model reduces to,

\(y_i= \alpha + 5x_i + \epsilon_i \), where \( \epsilon_i \sim N(0, \sigma^2)\) and \(\epsilon_i \)'s are i.i.d.

now we know that the Ordinary Least Square (OLS) estimate of \(\alpha\) is

\( \tilde{\alpha} = \bar{y} - \tilde{\beta}\bar{x} \) (How ??) where \(\tilde{\beta}\) is the (generally) the OLS estimate of \(\beta\), but here \(\beta=5\) is known, so,

\(\tilde{\alpha}= \bar{y} - 5\bar{x} \) again,

\(E(\tilde{\alpha})=E( \bar{y}-5\bar{x})=alpha-(\beta-5)\bar{x}\), hence \( \tilde{\alpha} \) is a biased estimator for \(\alpha\) with \(Bias_{\alpha}(\tilde{\alpha})= (\beta-5)\bar{x}\).

So, the Mean Squared Error, MSE of \(\tilde{\alpha}\) is,

\(MSE_{\alpha}(\tilde{\alpha})= E(\tilde{\alpha} - \alpha)^2=Var(\tilde{\alpha}) \) + \({Bias^2}_{\alpha}(\tilde{\alpha}) \)

\(= frac{\sigma^2}{n}+ \bar{x}^2(\beta-5)^2 \)

[ as, it follows clearly from the model, \( y_i \sim N( \alpha +\beta x_i , \sigma^2)\) and \(x_i\)'s are non-stochastic ] .

(b) the last part follows directly from the, the note I provided at the end of part (a),

that is, \(y_i \sim N( \alpha + \beta x_i , \sigma^2 ) \) and we have to find the Maximum Likelihood Estimator of \(\sigma^2\) and \(\beta\) and then use the inavriant property of MLE. ( in the MSE obtained in (a)). In leave it as an Exercise !! Finish it Yourself !


Food For Thought

Suppose you don't know the value of \(\beta\) even, What will be the MSE of \(\tilde{\alpha}\) in that case ?

Also, find the OLS estimate of \(\beta\) and you already have done it for \(\alpha\), so now find the MLEs of all \(\alpha\) and \(\beta\). Are the OLS estimates are identical to the MLEs you obtained ? Which assumption induces this coincidence ?? What do you think !!


Similar Problems and Solutions



ISI MStat PSB 2008 Problem 10
Outstanding Statistics Program with Applications

Outstanding Statistics Program with Applications

Subscribe to Cheenta at Youtube


ISI MStat PSB 2013 Problem 4 | Linear Regression

This is a sample problem from ISI MStat PSB 2013 Problem 4. It is based on the simple linear regression model, finding the estimates, and MSEs. But think over the "Food for Thought" any kind of discussion will be appreciated. Give it a try!

Problem- ISI MStat PSB 2013 Problem 4


Consider n independent observation \( { (x_i,y_i) :1 \le i \le n} \) from the model

\( Y= \alpha + \beta x + \epsilon \) ,

where \(\epsilon\) is normal with mean 0 and variance \( \sigma^2\) . Let \( \hat{\alpha}, \hat{\beta} \) and \( \hat{\sigma}^2 \) be the maximum likelihood estimators of \( \alpha , \beta \) and \( \sigma^2\) , respectively. Let \( v_{11}, v_{22} \) and \(v_{12}\) be the estimated values of \( Var(\hat{\alpha}), Var(\hat{\beta} \) and \( Cov ( \hat{\alpha}, \hat{\beta}) \), respectively.

(a) What is the estimated mean of Y, when when \( x=x_o\) ? Estimate the mean squared error of this estimator .

(b) What is the predicted value of Y, when when \( x=x_o\) ? Estimate the mean squared error of this predictor .

Prerequisites


Linear Regression

Method of Least Squares

Maximum likelihood Estimators.

Mean Squared Error.

Solution :

Here for the given model,

we have , the random errors, \( \epsilon \sim n(0, \sigma^2) \), and the maximum likelihood estimators (MLE), of the model parameters are given by \( \hat{\alpha}, \hat{\beta} \) and \( \hat{\sigma}^2 \). The interesting thing about this model is, since the random errors \(\epsilon\) are Gaussian Random variables, the Ordinary Least Square Estimates of the model parameters \( \alpha, \beta \) and \( \sigma^2\), are identical to their Maximum Likelihood Estimators, ( which are already given!). How ?? Verify it yourself and once and remember it henceforth.

So, here \( \hat{\alpha}, \hat{\beta} \) and \( \hat{\sigma}^2 \) there also the OLS estimates of the model parameters respectively.

And By Gauss-Markov Theorem, the OLS estimates of the model parameters are the BLUE (Best Linear Unbiased Estimator), for the model parameters. So, here \( \hat{\alpha}, \hat{\beta} \) and \( \hat{\sigma}^2 \) are also the unbiased estimators of \( \alpha, \beta \) and \(\sigma^2\) respectively.

(a) Now we need to find the estimated mean Y given \(x=x_o\) ,

\( \hat{ E( Y| x=x_o)}= \hat{\alpha} + \hat{\beta} x_o \) is the estimated mean of Y given \( x=x_o\).

Now since, the given MLEs ( OLSEs) are also unbiased for their respective parameters,

\( MSE( \hat{ E( Y| x=x_o)})=MSE(\hat{\alpha} + \hat{\beta} x_o)=E(\hat{\alpha} + \hat{\beta} x_o-(\alpha + \beta x_o))^2 \)

=\( E(\hat{\alpha} + \hat{\beta} x_o-E(\hat{\alpha} + \hat{\beta} x_o))^2 \)

=\( Var( \hat{\alpha} + \hat{\beta} x_o) \)

= \( Var(\hat{\alpha}+2x_o Cov(\hat{\alpha}, \hat{\beta})+ {x_o}^2Var(\hat{\beta}) \)

So, \( .MSE( \hat{ E( Y| x=x_o)})= v_{11} +2x_o v_{12} + {x_o}^2 {v_{22}} \).

(b) Similarly, when \(x=x_o \) , the predicted value of Y would be,

\( \hat{Y} = \hat{\alpha} + \hat{\beta} x_o +\epsilon \) is the predicted value of Y when \(x=x_o\) is given.

Using similar arguments, as in (a) Properties of independence between the model parameters , verify that,

\(MSE(\hat{Y})= v_{11}+ 2x_o v_{12} + {x_o}^2{ v_{22}}+{\hat{\sigma}^2} \). Hence we are done !


Food For Thought

Now, can you explain Why, the Maximum Likelihood Estimators and Ordinary Least Square Estimates are identical, when the model assumes Gaussian errors ??

Wait!! Not done yet. The main course is served below !!

In a game of dart, a thrower throws a dart randomly and uniformly in a unit circle. Let \(\theta\) be the angle between the line segment joining the dart and the center and the horizontal axis, now consider Z be a random variable. When the thrower is lefty , Z=-1 and when the thrower is right-handed , Z=1 . Assume that getting a Left-handed and Right-handed thrower is equally likely ( is it really equally likely, in real scenario ?? ). Can you construct a regression model, for regressing \(\theta\) on Z.

Think over it, if you want to discuss, we can do that too !!


Similar Problems and Solutions



ISI MStat PSB 2008 Problem 10
Outstanding Statistics Program with Applications

Outstanding Statistics Program with Applications

Subscribe to Cheenta at Youtube


Size, Power, and Condition | ISI MStat 2019 PSB Problem 9

This is a problem from the ISI MStat Entrance Examination, 2019. This primarily tests one's familiarity with size, power of a test and whether he/she is able to condition an event properly.

The Problem:

Let Z be a random variable with probability density function

\( f(z)=\frac{1}{2} e^{-|z- \mu|} , z \in \mathbb{R} \) with parameter \( \mu \in \mathbb{R} \). Suppose, we observe \(X = \) max \( (0,Z) \).

(a)Find the constant c such that the test that "rejects when \( X>c \)" has size 0.05 for the null hypothesis \(H_0 : \mu=0 \).

(b)Find the power of this test against the alternative hypothesis \(H_1: \mu =2 \).

Prerequisites:

And believe me as Joe Blitzstein says: "Conditioning is the soul of statistics"

Solution:

(a) If you know what size of a test means, then you can easily write down the condition mentioned in part(a) in mathematical terms.

It simply means \( P_{H_0}(X>c)=0.05 \)

Now, under \( H_0 \), \( \mu=0 \).

So, we have the pdf of Z as \( f(z)=\frac{1}{2} e^{-|z|} \)

As the support of Z is \( \mathbb{R} \), we can partition it in \( \{Z \ge 0,Z <0 \} \).

Now, let's condition based on this partition. So, we have:

\( P_{H_0}(X > c)=P_{H_0}(X>c , Z \ge 0)+ P_{H_0}(X>c, Z<0) =P_{H_0}(X>c , Z \ge 0) =P_{H_0}(Z > c) \)

Do, you understand the last equality? (Try to convince yourself why)

So, \( P_{H_0}(X >c)=P_{H_0}(Z > c)=\int_{c}^{\infty} \frac{1}{2} e^{-|z|} dz = \frac{1}{2}e^{-c} \)

Equating \(\frac{1}{2}e^{-c} \) with 0.05, we get \( c= \ln{10} \)

(b) The second part is just mere calculation given already you know the value of c.

Power of test against \(H_1 \) is given by:

\(P_{H_1}(X>\ln{10})=P_{H_1}(Z > \ln{10})=\int_{\ln{10}}^{\infty} \frac{1}{2} e^{-|z-2|} dz = \frac{e^2}{20} \)

Try out this one:

The pdf occurring in this problem is an example of a Laplace distribution.Look it up on the internet if you are not aware and go through its properties.

Suppose you have a random variable V which follows Exponential Distribution with mean 1.

Let I be a Bernoulli(\(\frac{1}{2} \)) random variable. It is given that I,V are independent.

Can you find a function h (which is also a random variable), \(h=h(I,V) \) ( a continuous function of I and V) such that h has the standard Laplace distribution?

Restricted Regression Problem | ISI MStat 2017 PSB Problem 7

This problem is a regression problem, where we use the ordinary least square methods, to estimate the parameters in a restricted case scenario. This is ISI MStat 2017 PSB Problem 7.

Problem

Consider independent observations \({\left(y_{i}, x_{1 i}, x_{2 i}\right): 1 \leq i \leq n}\) from the regression model
$$
y_{i}=\beta_{1} x_{1 i}+\beta_{2} x_{2 i}+\epsilon_{i}, i=1, \ldots, n
$$ where \(x_{1 i}\) and \(x_{2 i}\) are scalar covariates, \(\beta_{1}\) and \(\beta_{2}\) are unknown scalar
coefficients, and \(\epsilon_{i}\) are uncorrelated errors with mean 0 and variance \(\sigma^{2}>0\). Instead of using the correct model, we obtain an estimate \(\hat{\beta_{1}}\) of \(\beta_{1}\) by minimizing
$$
\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}
$$ Find the bias and mean squared error of \(\hat{\beta}_{1}\).

Prerequisites

Solution

It is sort of a restricted regression problem because maybe we have tested the fact that \(\beta_2 = 0\). Hence, we are interested in the estimate of \(\beta_1\) given \(\beta_2 = 0\). This is essentially the statistical significance of this problem, and we will see how it turns out in the estimate of \(\beta_1\).

Let's start with some notational nomenclature.
\( \sum_{i=1}^{n} a_{i} b_{i} = s_{a,b} \)

Let's minimize \( L(\beta_1) = \sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}\) by differentiating w.r.t \(\beta_1\) and equating to 0.

\( \frac{dL(\beta_1)}{d\beta_1}\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2} = 0\)

\( \Rightarrow \sum_{i=1}^{n} x_{1 i} \left(y_{i}-\beta_{1} x_{1 i}\right) = 0 \)

\( \Rightarrow \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}} \)

From, the given conditions, \( E(Y_{i})=\beta_{1} X_{1 i}+\beta_{2} X_{2 i}\).

\( \Rightarrow E(s_{X_{1},Y}) = \beta_{1}s_{X_{1},X_{1}} +\beta_{2} s_{X_{1},X_{2}} \).

Since, \(x's\) are constant, \( E(\hat{\beta_1}) = \beta_{1} +\beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} \).

\( Bias(\hat{\beta_1}) = \beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} \).

Thus, observe that the more \( \beta_2 \) is close to 0, the more bias is close to 0.

From, the given conditions,

\( Y_{i} - \beta_{1} X_{1 i} - \beta_{2} X_{2 i}\) ~ Something\(( 0 , \sigma^2\)).

\( \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}}\) ~ Something\(( E(\hat{\beta_{1}}) , Var(\hat{\beta_1}))\).

\(Var(\hat{\beta_1}) = \frac{\sum_{i=1}^{n} x_{1i}^2 Var(Y_{i})}{s_{X_1, X_1}^2} = \frac{\sigma^2}{s_{X_1, X_1}} \)

\( MSE(\hat{\beta_1}) = Variance + \text{Bias}^2 = \frac{\sigma^2}{s_{X_1, X_1}} + \beta_{2}^2(\frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}})^2\)

Observe, that even the MSE is minimized if \(\beta_2 = 0\).

Invariant Regression Estimate | ISI MStat 2016 PSB Problem 7

This cute little problem gives us the wisdom that when we minimize two functions at a single point uniquely, then their sum is also minimized at the same point. This Invariant Regression Estimate is applied to calculate the least square estimates of two group regression from ISI MStat 2016 Problem 7.

Problem- Invariant Regression Estimate

Suppose \({(y_{i}, x_{1 i}, x_{2 i}, \ldots, x_{k i}): i=1,2, \ldots, n_{1}+n_{2}}\) represents a set of multivariate observations. It is found that the least squares linear regression fit of \(y\) on \(\left(x_{1}, \ldots, x_{k}\right)\) based on the first \(n_{1}\) observations is the same as that based on the remaining \(n_{2}\) observations, and is given by
\(y=\hat{\beta}_{0}+\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}\)
If the regression is now performed using all \(\left(n_{1}+n_{2}\right)\) observations, will the regression equation remain the same? Justify your answer.

Prerequisites

Solution

Observe that we need to find the OLS estimates of \({\beta}{i} \forall i \).

\(f(\tilde{\beta}) = \sum_{i = 1}^{n_1} (y - {\beta}_{0} - (\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}))^2 \), where \(\tilde{\beta} = ({\beta}_0, {\beta}_1, ..., {\beta}_k ) \)

\(g(\tilde{\beta}) = \sum_{i = n_1}^{n_1+n_2} (y - {\beta}_{0} - (\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}))^2 \), where \(\tilde{\beta} = ({\beta}_0, {\beta}_1, ..., {\beta}_k ) \)

\(\hat{\tilde{\beta}} = (\hat{{\beta}_0}, \hat{{\beta}_1}, ..., \hat{{\beta}_k )} \)

\( h(\tilde{\beta}) = f(\tilde{\beta}) + g(\tilde{\beta}) = \sum_{i = 1}^{n_1+n_2} (y - {\beta}_{0} - (\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}))^2 \), where \(\tilde{\beta} = ({\beta}_0, {\beta}_1, ..., {\beta}_k ) \).

Now, \( h(\tilde{\beta})\) is the loss squared erorr under the grouped regression, which needs to be minimized with respect to \(\tilde{\beta} \).

Now, by the given conditions, \(f(\tilde{\beta})\) and \(g(\tilde{\beta})\) are both uniquely minimized at \( \hat{\tilde{\beta}}\), therefore \(h(\tilde{\beta}) = f(\tilde{\beta}) + g(\tilde{\beta})\) will be uniquely minimized at \(\hat{\tilde{\beta}}\) by the prerequisite.

Hence, the final estimate of \(\tilde{\beta}\) will be \( \hat{\tilde{\beta}}\).