## ISI MStat Entrance 2020 Problems and Solutions

This post contains Indian Statistical Institute, ISI MStat Entrance 2020 Problems and Solutions. Try to solve them out.

## Subjective Paper – ISI MStat Entrance 2020 Problems and Solutions

• Let $f(x)=x^{2}-2 x+2$. Let $L_{1}$ and $L_{2}$ be the tangents to its graph at $x=0$ and $x=2$ respectively. Find the area of the region enclosed by the graph of $f$ and the two lines $L_{1}$ and $L_{2}$.

Solution
• Find the number of $3 \times 3$ matrices $A$ such that the entries of $A$ belong to the set $\mathbb{Z}$ of all integers, and such that the trace of $A^{t} A$ is 6 . $\left(A^{t}\right.$ denotes the transpose of the matrix $\left.A\right)$.

Solution
• Consider $n$ independent and identically distributed positive random variables $X_{1}, X_{2}, \ldots, X_{n},$ Suppose $S$ is a fixed subset of ${1,2, \ldots, n}$ consisting of $k$ distinct elements where $1 \leq k<n$
(a) Compute $\mathbb{E}\left[\frac{\sum_{i \in S} X_{i}}{\sum_{i=1}^{n} X_{i}}\right]$

(b) Assume that $X_{i}$ ‘s have mean $\mu$ and variance $\sigma^{2}, 0<\sigma^{2}<\infty$. If $j \notin S,$ show that the correlation between $\left(\sum_{i \in S} X_{i}\right) X_{j}$ and $\sum_{i \in S} X_{i}$ lies between -$\frac{1}{\sqrt{k+1}} \text { and } \frac{1}{\sqrt{k+1}}$.

Solution
• Let $X_{1,} X_{2}, \ldots, X_{n}$ be independent and identically distributed random variables. Let $S_{n}=X_{1}+\cdots+X_{n}$. For each of the following statements, determine whether they are true or false. Give reasons in each case.

(a) If $S_{n} \sim E_{x p}$ with mean $n,$ then each $X_{i} \sim E x p$ with mean 1 .

(b) If $S_{n} \sim B i n(n k, p),$ then each $X_{i} \sim B i n(k, p)$

Solution
• Let $U_{1}, U_{2}, \ldots, U_{n}$ be independent and identically distributed random variables each having a uniform distribution on (0,1) . Let $X=\min \{U_{1}, U_{2}, \ldots, U_{n}\}$, $Y=\max \{U_{1}, U_{2}, \ldots, U_{n}\}$

Evaluate $\mathbb{E}[X \mid Y=y]$ and $\mathbb{E}[Y \mid X=x]$.

Solution
• Suppose individuals are classified into three categories $C_{1}, C_{2}$ and $C_{3}$ Let $p^{2},(1-p)^{2}$ and $2 p(1-p)$ be the respective population proportions, where $p \in(0,1)$. A random sample of $N$ individuals is selected from the population and the category of each selected individual recorded.

For $i=1,2,3,$ let $X_{i}$ denote the number of individuals in the sample belonging to category $C_{i} .$ Define $U=X_{1}+\frac{X_{3}}{2}$

(a) Is $U$ sufficient for $p ?$ Justify your answer.

(b) Show that the mean squared error of $\frac{U}{N}$ is $\frac{p(1-p)}{2 N}$

Solution
• Consider the following model: $y_{i}=\beta x_{i}+\varepsilon_{i} x_{i}, \quad i=1,2, \ldots, n$, where $y_{i}, i=1,2, \ldots, n$ are observed; $x_{i}, i=1,2, \ldots, n$ are known positive constants and $\beta$ is an unknown parameter. The errors $\varepsilon_{1}, \varepsilon_{2}, \ldots, \varepsilon_{n}$ are independent and identically distributed random variables having the probability density function $f(u)=\frac{1}{2 \lambda} \exp \left(-\frac{|u|}{\lambda}\right), \quad-\infty<u<\infty$ and $\lambda$ is an unknown parameter.

(a) Find the least squares estimator of $\beta$.

(b) Find the maximum likelihood estimator of $\beta$.

Solution
• Assume that $X_{1}, \ldots, X_{n}$ is a random sample from $N(\mu, 1),$ with $\mu \in \mathbb{R}$. We want to test $H_{0}: \mu=0$ against $H_{1}: \mu=1$. For a fixed integer $m \in{1, \ldots, n},$ the following statistics are defined:

\begin{aligned}
T_{1} &= \frac{\left(X_{1}+\ldots+X_{m}\right)}{m} \\
T_{2} &= \frac{\left(X_{2}+\ldots+X_{m+1}\right)}{m} \\
\vdots &=\vdots \\
T_{n-m+1} &= \frac{\left(X_{n-m+1}+\ldots+X_{n}\right)}{m}
\end{aligned}

$\operatorname{Fix} \alpha \in(0,1) .$ Consider the test

Reject $H_{0}$ if $\max \{T_{i}: 1 \leq i \leq n-m+1\}>c_{m, \alpha}$

Find a choice of $c_{m, \alpha} \in \mathbb{R}$ in terms of the standard normal distribution function $\Phi$ that ensures that the size of the test is at most $\alpha$.

Solution
• A finite population has $N$ units, with $x_{i}$ being the value associated with the $i$ th unit, $i=1,2, \ldots, N$. Let $\bar{x}{N}$ be the population mean. A statistician carries out the following experiment.

Step 1: Draw an SRSWOR of size $n({1}$ and denote the sample mean by $\bar{X}{n}$

Step 2: Draw a SRSWR of size $m$ from $S_{1}$. The $x$ -values of the sampled units are denoted by $\{Y_{1}, \ldots, Y_{m}\}$

An estimator of the population mean is defined as,

$\widehat{T}{m}=\frac{1}{m} \sum{i=1}^{m} Y_{i}$

(a) Show that $\widehat{T}{m}$ is an unbiased estimator of the population mean.

(b) Which of the following has lower variance: $\widehat{T}{m}$ or $\bar{X}_{n} ?$

Solution

## Objective Paper

 1. C 2. D 3. A 4. B 5. A 6. B 7. C 8. A 9. C 10. A 11. C 12. D 13. C 14. B 15. B 16. C 17. D 18. B 19. B 20. C 21. C 22. D 23. A 24. B 25. D 26. B 27. D 28. D 29. B 30. C

Watch videos related to the ISI MStat Problems here.

Categories

## ISI MStat PSB 2018 Problem 9 | Regression Analysis

This is a very simple sample problem from ISI MStat PSB 2018 Problem 9. It is mainly based on estimation of ordinary least square estimates and Likelihood estimates of regression parameters. Try it!

## Problem – ISI MStat PSB 2018 Problem 9

Suppose $(y_i,x_i)$ satisfies the regression model,

$y_i= \alpha + \beta x_i + \epsilon_i$ for $i=1,2,….,n.$

where ${ x_i : 1 \le i \le n }$ are fixed constants and ${ \epsilon_i : 1 \le i \le n}$ are i.i.d. $N(0, \sigma^2)$ errors, where $\alpha, \beta$ and $\sigma^2 (>0)$ are unknown parameters.

(a) Let $\tilde{\alpha}$ denote the least squares estimate of $\alpha$ obtained assuming $\beta=5$. Find the mean squared error (MSE) of $\tilde{\alpha}$ in terms of model parameters.

(b) Obtain the maximum likelihood estimator of this MSE.

### Prerequisites

Normal Distribution

Ordinary Least Square Estimates

Maximum Likelihood Estimates

## Solution :

These problem is simple enough,

for the given model, $y_i= \alpha + \beta x_i + \epsilon_i$ for $i=1,….,n$.

The scenario is even simpler here since, it is given that $\beta=5$ , so our model reduces to,

$y_i= \alpha + 5x_i + \epsilon_i$, where $\epsilon_i \sim N(0, \sigma^2)$ and $\epsilon_i$’s are i.i.d.

now we know that the Ordinary Least Square (OLS) estimate of $\alpha$ is

$\tilde{\alpha} = \bar{y} – \tilde{\beta}\bar{x}$ (How ??) where $\tilde{\beta}$ is the (generally) the OLS estimate of $\beta$, but here $\beta=5$ is known, so,

$\tilde{\alpha}= \bar{y} – 5\bar{x}$ again,

$E(\tilde{\alpha})=E( \bar{y}-5\bar{x})=alpha-(\beta-5)\bar{x}$, hence $\tilde{\alpha}$ is a biased estimator for $\alpha$ with $Bias_{\alpha}(\tilde{\alpha})= (\beta-5)\bar{x}$.

So, the Mean Squared Error, MSE of $\tilde{\alpha}$ is,

$MSE_{\alpha}(\tilde{\alpha})= E(\tilde{\alpha} – \alpha)^2=Var(\tilde{\alpha})$ + ${Bias^2}_{\alpha}(\tilde{\alpha})$

$= frac{\sigma^2}{n}+ \bar{x}^2(\beta-5)^2$

[ as, it follows clearly from the model, $y_i \sim N( \alpha +\beta x_i , \sigma^2)$ and $x_i$’s are non-stochastic ] .

(b) the last part follows directly from the, the note I provided at the end of part (a),

that is, $y_i \sim N( \alpha + \beta x_i , \sigma^2 )$ and we have to find the Maximum Likelihood Estimator of $\sigma^2$ and $\beta$ and then use the inavriant property of MLE. ( in the MSE obtained in (a)). In leave it as an Exercise !! Finish it Yourself !

## Food For Thought

Suppose you don’t know the value of $\beta$ even, What will be the MSE of $\tilde{\alpha}$ in that case ?

Also, find the OLS estimate of $\beta$ and you already have done it for $\alpha$, so now find the MLEs of all $\alpha$ and $\beta$. Are the OLS estimates are identical to the MLEs you obtained ? Which assumption induces this coincidence ?? What do you think !!

Categories

## ISI MStat PSB 2013 Problem 4 | Linear Regression

This is a sample problem from ISI MStat PSB 2013 Problem 4. It is based on the simple linear regression model, finding the estimates, and MSEs. But think over the “Food for Thought” any kind of discussion will be appreciated. Give it a try!

## Problem– ISI MStat PSB 2013 Problem 4

Consider n independent observation ${ (x_i,y_i) :1 \le i \le n}$ from the model

$Y= \alpha + \beta x + \epsilon$ ,

where $\epsilon$ is normal with mean 0 and variance $\sigma^2$ . Let $\hat{\alpha}, \hat{\beta}$ and $\hat{\sigma}^2$ be the maximum likelihood estimators of $\alpha , \beta$ and $\sigma^2$ , respectively. Let $v_{11}, v_{22}$ and $v_{12}$ be the estimated values of $Var(\hat{\alpha}), Var(\hat{\beta}$ and $Cov ( \hat{\alpha}, \hat{\beta})$, respectively.

(a) What is the estimated mean of Y, when when $x=x_o$ ? Estimate the mean squared error of this estimator .

(b) What is the predicted value of Y, when when $x=x_o$ ? Estimate the mean squared error of this predictor .

### Prerequisites

Linear Regression

Method of Least Squares

Maximum likelihood Estimators.

Mean Squared Error.

## Solution :

Here for the given model,

we have , the random errors, $\epsilon \sim n(0, \sigma^2)$, and the maximum likelihood estimators (MLE), of the model parameters are given by $\hat{\alpha}, \hat{\beta}$ and $\hat{\sigma}^2$. The interesting thing about this model is, since the random errors $\epsilon$ are Gaussian Random variables, the Ordinary Least Square Estimates of the model parameters $\alpha, \beta$ and $\sigma^2$, are identical to their Maximum Likelihood Estimators, ( which are already given!). How ?? Verify it yourself and once and remember it henceforth.

So, here $\hat{\alpha}, \hat{\beta}$ and $\hat{\sigma}^2$ there also the OLS estimates of the model parameters respectively.

And By Gauss-Markov Theorem, the OLS estimates of the model parameters are the BLUE (Best Linear Unbiased Estimator), for the model parameters. So, here $\hat{\alpha}, \hat{\beta}$ and $\hat{\sigma}^2$ are also the unbiased estimators of $\alpha, \beta$ and $\sigma^2$ respectively.

(a) Now we need to find the estimated mean Y given $x=x_o$ ,

$\hat{ E( Y| x=x_o)}= \hat{\alpha} + \hat{\beta} x_o$ is the estimated mean of Y given $x=x_o$.

Now since, the given MLEs ( OLSEs) are also unbiased for their respective parameters,

$MSE( \hat{ E( Y| x=x_o)})=MSE(\hat{\alpha} + \hat{\beta} x_o)=E(\hat{\alpha} + \hat{\beta} x_o-(\alpha + \beta x_o))^2$

=$E(\hat{\alpha} + \hat{\beta} x_o-E(\hat{\alpha} + \hat{\beta} x_o))^2$

=$Var( \hat{\alpha} + \hat{\beta} x_o)$

= $Var(\hat{\alpha}+2x_o Cov(\hat{\alpha}, \hat{\beta})+ {x_o}^2Var(\hat{\beta})$

So, $.MSE( \hat{ E( Y| x=x_o)})= v_{11} +2x_o v_{12} + {x_o}^2 {v_{22}}$.

(b) Similarly, when $x=x_o$ , the predicted value of Y would be,

$\hat{Y} = \hat{\alpha} + \hat{\beta} x_o +\epsilon$ is the predicted value of Y when $x=x_o$ is given.

Using similar arguments, as in (a) Properties of independence between the model parameters , verify that,

$MSE(\hat{Y})= v_{11}+ 2x_o v_{12} + {x_o}^2{ v_{22}}+{\hat{\sigma}^2}$. Hence we are done !

## Food For Thought

Now, can you explain Why, the Maximum Likelihood Estimators and Ordinary Least Square Estimates are identical, when the model assumes Gaussian errors ??

Wait!! Not done yet. The main course is served below !!

In a game of dart, a thrower throws a dart randomly and uniformly in a unit circle. Let $\theta$ be the angle between the line segment joining the dart and the center and the horizontal axis, now consider Z be a random variable. When the thrower is lefty , Z=-1 and when the thrower is right-handed , Z=1 . Assume that getting a Left-handed and Right-handed thrower is equally likely ( is it really equally likely, in real scenario ?? ). Can you construct a regression model, for regressing $\theta$ on Z.

Think over it, if you want to discuss, we can do that too !!

Categories

## Size, Power, and Condition | ISI MStat 2019 PSB Problem 9

This is a problem from the ISI MStat Entrance Examination, 2019. This primarily tests one’s familiarity with size, power of a test and whether he/she is able to condition an event properly.

## The Problem:

Let Z be a random variable with probability density function

$f(z)=\frac{1}{2} e^{-|z- \mu|} , z \in \mathbb{R}$

with parameter $\mu \in \mathbb{R}$. Suppose, we observe $X =$ max $(0,Z)$.

(a)Find the constant c such that the test that “rejects when $X>c$” has size 0.05 for the null hypothesis $H_0 : \mu=0$.

(b)Find the power of this test against the alternative hypothesis $H_1: \mu =2$.

## Prerequisites:

• A thorough knowledge about the size and power of a test
• Having a good sense of conditioning whenever a function (like max()) is defined piecewise.

And believe me as Joe Blitzstein says: “Conditioning is the soul of statistics”

## Solution:

(a) If you know what size of a test means, then you can easily write down the condition mentioned in part(a) in mathematical terms.

It simply means $P_{H_0}(X>c)=0.05$

Now, under $H_0$, $\mu=0$.

So, we have the pdf of Z as $f(z)=\frac{1}{2} e^{-|z|}$

As the support of Z is $\mathbb{R}$, we can partition it in $\{Z \ge 0,Z <0 \}$.

Now, let’s condition based on this partition. So, we have:

$P_{H_0}(X > c)=P_{H_0}(X>c , Z \ge 0)+ P_{H_0}(X>c, Z<0) =P_{H_0}(X>c , Z \ge 0) =P_{H_0}(Z > c)$

Do, you understand the last equality? (Try to convince yourself why)

So, $P_{H_0}(X >c)=P_{H_0}(Z > c)=\int_{c}^{\infty} \frac{1}{2} e^{-|z|} dz = \frac{1}{2}e^{-c}$

Equating $\frac{1}{2}e^{-c}$ with 0.05, we get $c= \ln{10}$

(b) The second part is just mere calculation given already you know the value of c.

Power of test against $H_1$ is given by:

$P_{H_1}(X>\ln{10})=P_{H_1}(Z > \ln{10})=\int_{\ln{10}}^{\infty} \frac{1}{2} e^{-|z-2|} dz = \frac{e^2}{20}$

## Try out this one:

The pdf occurring in this problem is an example of a Laplace distribution.Look it up on the internet if you are not aware and go through its properties.

Suppose you have a random variable V which follows Exponential Distribution with mean 1.

Let I be a Bernoulli($\frac{1}{2}$) random variable. It is given that I,V are independent.

Can you find a function h (which is also a random variable), $h=h(I,V)$ ( a continuous function of I and V) such that h has the standard Laplace distribution?

Categories

## Restricted Regression Problem | ISI MStat 2017 PSB Problem 7

This problem is a regression problem, where we use the ordinary least square methods, to estimate the parameters in a restricted case scenario. This is ISI MStat 2017 PSB Problem 7.

## Problem

Consider independent observations ${\left(y_{i}, x_{1 i}, x_{2 i}\right): 1 \leq i \leq n}$ from the regression model
$$y_{i}=\beta_{1} x_{1 i}+\beta_{2} x_{2 i}+\epsilon_{i}, i=1, \ldots, n$$ where $x_{1 i}$ and $x_{2 i}$ are scalar covariates, $\beta_{1}$ and $\beta_{2}$ are unknown scalar
coefficients, and $\epsilon_{i}$ are uncorrelated errors with mean 0 and variance $\sigma^{2}>0$. Instead of using the correct model, we obtain an estimate $\hat{\beta_{1}}$ of $\beta_{1}$ by minimizing
$$\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}$$ Find the bias and mean squared error of $\hat{\beta}_{1}$.

## Solution

It is sort of a restricted regression problem because maybe we have tested the fact that $\beta_2 = 0$. Hence, we are interested in the estimate of $\beta_1$ given $\beta_2 = 0$. This is essentially the statistical significance of this problem, and we will see how it turns out in the estimate of $\beta_1$.

$\sum_{i=1}^{n} a_{i} b_{i} = s_{a,b}$

Let’s minimize $L(\beta_1) = \sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}$ by differentiating w.r.t $\beta_1$ and equating to 0.

$\frac{dL(\beta_1)}{d\beta_1}\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2} = 0$

$\Rightarrow \sum_{i=1}^{n} x_{1 i} \left(y_{i}-\beta_{1} x_{1 i}\right) = 0$

$\Rightarrow \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}}$

From, the given conditions, $E(Y_{i})=\beta_{1} X_{1 i}+\beta_{2} X_{2 i}$.

$\Rightarrow E(s_{X_{1},Y}) = \beta_{1}s_{X_{1},X_{1}} +\beta_{2} s_{X_{1},X_{2}}$.

Since, $x’s$ are constant, $E(\hat{\beta_1}) = \beta_{1} +\beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}}$.

$Bias(\hat{\beta_1}) = \beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}}$.

Thus, observe that the more $\beta_2$ is close to 0, the more bias is close to 0.

From, the given conditions,

$Y_{i} – \beta_{1} X_{1 i} – \beta_{2} X_{2 i}$ ~ Something$( 0 , \sigma^2$).

$\hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}}$ ~ Something$( E(\hat{\beta_{1}}) , Var(\hat{\beta_1}))$.

$Var(\hat{\beta_1}) = \frac{\sum_{i=1}^{n} x_{1i}^2 Var(Y_{i})}{s_{X_1, X_1}^2} = \frac{\sigma^2}{s_{X_1, X_1}}$

$MSE(\hat{\beta_1}) = Variance + \text{Bias}^2 = \frac{\sigma^2}{s_{X_1, X_1}} + \beta_{2}^2(\frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}})^2$

Observe, that even the MSE is minimized if $\beta_2 = 0$.

Categories

## Invariant Regression Estimate | ISI MStat 2016 PSB Problem 7

This cute little problem gives us the wisdom that when we minimize two functions at a single point uniquely, then their sum is also minimized at the same point. This Invariant Regression Estimate is applied to calculate the least square estimates of two group regression from ISI MStat 2016 Problem 7.

## Problem- Invariant Regression Estimate

Suppose ${(y_{i}, x_{1 i}, x_{2 i}, \ldots, x_{k i}): i=1,2, \ldots, n_{1}+n_{2}}$ represents a set of multivariate observations. It is found that the least squares linear regression fit of $y$ on $\left(x_{1}, \ldots, x_{k}\right)$ based on the first $n_{1}$ observations is the same as that based on the remaining $n_{2}$ observations, and is given by
$y=\hat{\beta}_{0}+\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}$
If the regression is now performed using all $\left(n_{1}+n_{2}\right)$ observations, will the regression equation remain the same? Justify your answer.

### Prerequisites

• $f(\tilde{x})$ and $g(\tilde{x})$ are both uniquely minimized at $\tilde{x} = \tilde{x_0}$, then $f(\tilde{x}) + g(\tilde{x})$ is uniquely minimized at $\tilde{x} = \tilde{x_0}$.

## Solution

Observe that we need to find the OLS estimates of ${\beta}{i} \forall i$.

$f(\tilde{\beta}) = \sum_{i = 1}^{n_1} (y – {\beta}_{0} – (\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}))^2$, where $\tilde{\beta} = ({\beta}_0, {\beta}_1, …, {\beta}_k )$

$g(\tilde{\beta}) = \sum_{i = n_1}^{n_1+n_2} (y – {\beta}_{0} – (\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}))^2$, where $\tilde{\beta} = ({\beta}_0, {\beta}_1, …, {\beta}_k )$

$\hat{\tilde{\beta}} = (\hat{{\beta}_0}, \hat{{\beta}_1}, …, \hat{{\beta}_k )}$

$h(\tilde{\beta}) = f(\tilde{\beta}) + g(\tilde{\beta}) = \sum_{i = 1}^{n_1+n_2} (y – {\beta}_{0} – (\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}))^2$, where $\tilde{\beta} = ({\beta}_0, {\beta}_1, …, {\beta}_k )$.

Now, $h(\tilde{\beta})$ is the loss squared erorr under the grouped regression, which needs to be minimized with respect to $\tilde{\beta}$.

Now, by the given conditions, $f(\tilde{\beta})$ and $g(\tilde{\beta})$ are both uniquely minimized at $\hat{\tilde{\beta}}$, therefore $h(\tilde{\beta}) = f(\tilde{\beta}) + g(\tilde{\beta})$ will be uniquely minimized at $\hat{\tilde{\beta}}$ by the prerequisite.

Hence, the final estimate of $\tilde{\beta}$ will be $\hat{\tilde{\beta}}$.