## ISI MStat Entrance 2020 Problems and Solutions

This post contains Indian Statistical Institute, ISI MStat Entrance 2020 Problems and Solutions. Try to solve them out.

## Subjective Paper – ISI MStat Entrance 2020 Problems and Solutions

• Let $f(x)=x^{2}-2 x+2$. Let $L_{1}$ and $L_{2}$ be the tangents to its graph at $x=0$ and $x=2$ respectively. Find the area of the region enclosed by the graph of $f$ and the two lines $L_{1}$ and $L_{2}$.

Solution
• Find the number of $3 \times 3$ matrices $A$ such that the entries of $A$ belong to the set $\mathbb{Z}$ of all integers, and such that the trace of $A^{t} A$ is 6 . $\left(A^{t}\right.$ denotes the transpose of the matrix $\left.A\right)$.

Solution
• Consider $n$ independent and identically distributed positive random variables $X_{1}, X_{2}, \ldots, X_{n},$ Suppose $S$ is a fixed subset of ${1,2, \ldots, n}$ consisting of $k$ distinct elements where $1 \leq k<n$
(a) Compute $\mathbb{E}\left[\frac{\sum_{i \in S} X_{i}}{\sum_{i=1}^{n} X_{i}}\right]$

(b) Assume that $X_{i}$ ‘s have mean $\mu$ and variance $\sigma^{2}, 0<\sigma^{2}<\infty$. If $j \notin S,$ show that the correlation between $\left(\sum_{i \in S} X_{i}\right) X_{j}$ and $\sum_{i \in S} X_{i}$ lies between -$\frac{1}{\sqrt{k+1}} \text { and } \frac{1}{\sqrt{k+1}}$.

Solution
• Let $X_{1,} X_{2}, \ldots, X_{n}$ be independent and identically distributed random variables. Let $S_{n}=X_{1}+\cdots+X_{n}$. For each of the following statements, determine whether they are true or false. Give reasons in each case.

(a) If $S_{n} \sim E_{x p}$ with mean $n,$ then each $X_{i} \sim E x p$ with mean 1 .

(b) If $S_{n} \sim B i n(n k, p),$ then each $X_{i} \sim B i n(k, p)$

Solution
• Let $U_{1}, U_{2}, \ldots, U_{n}$ be independent and identically distributed random variables each having a uniform distribution on (0,1) . Let $X=\min \{U_{1}, U_{2}, \ldots, U_{n}\}$, $Y=\max \{U_{1}, U_{2}, \ldots, U_{n}\}$

Evaluate $\mathbb{E}[X \mid Y=y]$ and $\mathbb{E}[Y \mid X=x]$.

Solution
• Suppose individuals are classified into three categories $C_{1}, C_{2}$ and $C_{3}$ Let $p^{2},(1-p)^{2}$ and $2 p(1-p)$ be the respective population proportions, where $p \in(0,1)$. A random sample of $N$ individuals is selected from the population and the category of each selected individual recorded.

For $i=1,2,3,$ let $X_{i}$ denote the number of individuals in the sample belonging to category $C_{i} .$ Define $U=X_{1}+\frac{X_{3}}{2}$

(a) Is $U$ sufficient for $p ?$ Justify your answer.

(b) Show that the mean squared error of $\frac{U}{N}$ is $\frac{p(1-p)}{2 N}$

Solution
• Consider the following model: $y_{i}=\beta x_{i}+\varepsilon_{i} x_{i}, \quad i=1,2, \ldots, n$, where $y_{i}, i=1,2, \ldots, n$ are observed; $x_{i}, i=1,2, \ldots, n$ are known positive constants and $\beta$ is an unknown parameter. The errors $\varepsilon_{1}, \varepsilon_{2}, \ldots, \varepsilon_{n}$ are independent and identically distributed random variables having the probability density function $f(u)=\frac{1}{2 \lambda} \exp \left(-\frac{|u|}{\lambda}\right), \quad-\infty<u<\infty$ and $\lambda$ is an unknown parameter.

(a) Find the least squares estimator of $\beta$.

(b) Find the maximum likelihood estimator of $\beta$.

Solution
• Assume that $X_{1}, \ldots, X_{n}$ is a random sample from $N(\mu, 1),$ with $\mu \in \mathbb{R}$. We want to test $H_{0}: \mu=0$ against $H_{1}: \mu=1$. For a fixed integer $m \in{1, \ldots, n},$ the following statistics are defined:

\begin{aligned}
T_{1} &= \frac{\left(X_{1}+\ldots+X_{m}\right)}{m} \\
T_{2} &= \frac{\left(X_{2}+\ldots+X_{m+1}\right)}{m} \\
\vdots &=\vdots \\
T_{n-m+1} &= \frac{\left(X_{n-m+1}+\ldots+X_{n}\right)}{m}
\end{aligned}

$\operatorname{Fix} \alpha \in(0,1) .$ Consider the test

Reject $H_{0}$ if $\max \{T_{i}: 1 \leq i \leq n-m+1\}>c_{m, \alpha}$

Find a choice of $c_{m, \alpha} \in \mathbb{R}$ in terms of the standard normal distribution function $\Phi$ that ensures that the size of the test is at most $\alpha$.

Solution
• A finite population has $N$ units, with $x_{i}$ being the value associated with the $i$ th unit, $i=1,2, \ldots, N$. Let $\bar{x}{N}$ be the population mean. A statistician carries out the following experiment.

Step 1: Draw an SRSWOR of size $n({1}$ and denote the sample mean by $\bar{X}{n}$

Step 2: Draw a SRSWR of size $m$ from $S_{1}$. The $x$ -values of the sampled units are denoted by $\{Y_{1}, \ldots, Y_{m}\}$

An estimator of the population mean is defined as,

$\widehat{T}{m}=\frac{1}{m} \sum{i=1}^{m} Y_{i}$

(a) Show that $\widehat{T}{m}$ is an unbiased estimator of the population mean.

(b) Which of the following has lower variance: $\widehat{T}{m}$ or $\bar{X}_{n} ?$

Solution

## Objective Paper

 1. C 2. D 3. A 4. B 5. A 6. B 7. C 8. A 9. C 10. A 11. C 12. D 13. C 14. B 15. B 16. C 17. D 18. B 19. B 20. C 21. C 22. D 23. A 24. B 25. D 26. B 27. D 28. D 29. B 30. C

Watch videos related to the ISI MStat Problems here.

Categories

## Testing of Hypothesis | ISI MStat 2016 PSB Problem 9

This is a problem from the ISI MStat Entrance Examination, 2016 involving the basic idea of Type 1 error of Testing of Hypothesis but focussing on the fundamental relationship of Exponential Distribution and the Geometric Distribution.

## The Problem:

Suppose $X_{1}, X_{2}, \ldots, X_{n}$ is a random sample from an exponential distribution with mean $\lambda$.

Assume that the observed data is available on $\left[X_{1}\right], \ldots,\left[X_{n}\right]$, instead of $X_{1}, \ldots, X_{n},$ where $[x]$ denotes the largest integer less than or equal to $x$.

Consider a test for $H_{0}: \lambda=1$ vs $H_{1}: \lambda>1$ which rejects $H_{0}$ when $\sum_{i=1}^{n}\left[X_{i}\right]>c_{n} .$

Given $\alpha \in(0,1),$ obtain values of $c_{n}$ such that the size of the test converges to $\alpha$ as $n \rightarrow \infty$.

## Prerequisites:

(a) Testing of Hypothesis

(b) Type 1 Error

(c) Exponential Distribution

(d) Relationship of Exponential Distribution and Geometric Distribution

(e) Central Limit Theorem

## Solution:

• X ~ Exponential($\lambda$), then $Y = [\frac{X}{a}]$ ~ Geom($p$), where $p = 1-e^{-\lambda a} \in(0,1)$

Proof:

$Y$ is clearly discrete taking values in the set of non-negative integers, due to the flooring. Then, for any integer $n \geq 0$ we have
$\begin{array}{c} P(Y=n)=P(X \in[\text {an, } a(n+1))) \ =\int_{a n}^{a(n+1)} \lambda \mathrm{e}^{-\lambda x} d x=(1-p)^{n} p \end{array}$
where $p=1-e^{-\lambda a} \in(0,1),$ as $\lambda>0$ and $a>0$.

• $X_i$ ~ Geom($p$), then $\sum_{i = 1}^{n}$ ~ NBinom(n,p)
• $X_i$ ~ Exponential($\lambda$), then $S_n = \sum_{i=1}^{n}\left[X_{i}\right]$ ~ NBinom($(n,p)$), where $p = 1-e^{-\lambda} \in(0,1)$

#### Testing of Hypothesis

$H_{0}: \lambda=1$ vs $H_{1}: \lambda>1$

We reject $H_{0}$ when $\sum_{i=1}^{n}\left[X_{i}\right]>c_{n} .$

Here, the size of the test i.e the Type 1 error (for simple hypothesis), $\alpha_n$ = $P(S_n > c_{n} | \lambda=1)$.

We want to select $c_n$ such that $\alpha_n \to \alpha$.

$S_n$ ~ NBinom($n,p$), where $p = 1-e^{-1}$ under $H_0$.

Now, $\frac{\sqrt{n}(\frac{S_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}} \rightarrow Z = N(0,1)$ by Central Limit Theorem.

Observe that thus, $\alpha_n = P(S_n > c_{n} | \lambda=1) \rightarrow P(Z > \frac{\sqrt{n}(\frac{c_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}}) = \alpha$.

Thus, $\frac{\sqrt{n}(\frac{c_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}} = Z_{\alpha}$.

We can solve this to find $c_n$, where $p = 1-e^{-1}$

## Food for Thought

If X ~ Exponential($\lambda$), then what is the distribution of {X} [ The fractional part of X]. This question is crucial is getting back Exponential Distrbution from Geometric Distribution.

Rather, the food for thought, asks you how do we get Exponential Distribution from Geometric Distribution.

Stay Tuned. Stay Blessed! See you in the next post.

Categories

## ISI MStat PSB 2009 Problem 8 | How big is the Mean?

This is a very simple and regular sample problem from ISI MStat PSB 2009 Problem 8. It It is based on testing the nature of the mean of Exponential distribution. Give it a Try it !

## Problem– ISI MStat PSB 2009 Problem 8

Let $X_1,…..,X_n$ be i.i.d. observation from the density,

$f(x)=\frac{1}{\mu}exp(-\frac{x}{\mu}) , x>0$

where $\mu >0$ is an unknown parameter.

Consider the problem of testing the hypothesis $H_o : \mu \le \mu_o$ against $H_1 : \mu > \mu_o$.

(a) Show that the test with critical region $[\bar{X} \ge \mu_o {\chi_{2n,1-\alpha}}^2/2n]$, where ${\chi^2}_{2n,1-\alpha}$ is the $(1-\alpha)$th quantile of the ${\chi^2}_{2n}$ distribution, has size $\alpha$.

(b) Give an expression of the power in terms of the c.d.f. of the ${\chi^2}_{2n}$ distribution.

### Prerequisites

Likelihood Ratio Test

Exponential Distribution

Chi-squared Distribution

## Solution :

This problem is quite regular and simple, from the given form of the hypotheses , it is almost clear that using Neyman-Pearson can land you in trouble. So, lets go for something more general , that is Likelihood Ratio Testing.

Hence, the Likelihood function of the $\mu$ for the given sample is ,

$L(\mu | \vec{X})=(\frac{1}{\mu})^n exp(-\frac{\sum_{i=1}^n X_i}{\mu}) , \mu>0$, also observe that sample mean $\vec{X}$ is the MLE of $\mu$.

So, the Likelihood Ratio statistic is,

$\lambda(\vec{x})=\frac{\sup_{\mu \le \mu_o}L(\mu |\vec{x})}{\sup_\mu L(\mu |\vec{x})} \\ =\begin{cases} 1 & \mu_o \ge \bar{X} \\ \frac{L(\mu_o|\vec{x})}{L(\bar{X}|\vec{x})} & \mu_o < \bar{X} \end{cases}$

So, our test function is ,

$\phi(\vec{x})=\begin{cases} 1 & \lambda(\vec{x})<k \\ 0 & otherwise \end{cases}$.

We, reject $H_o$ at size $\alpha$, when $\phi(\vec{x})=1$, for some $k$, $E_{H_o}(\phi) \le \alpha$,

Hence, $\lambda(\vec{x}) < k \\ \Rightarrow L(\mu_o|\vec{x})<kL(\bar{X}|\vec{x}) \\ \ln k_1 -\frac{1}{\mu_o}\sum_{i=1}^n X_i < \ln k -n \ln \bar{X} -\frac{1}{n} \\ n \ln \bar{X}-\frac{n\bar{X}}{\mu_o} < K*$.

for some constant, $K*$.

Let $g(\bar{x})=n\ln \bar{x} -\frac{n\bar{x}}{\mu_o}$, and observe that $g$ is,

decreasing function of $\bar{x}$ for $\bar{x} \ge \mu_o$,

Hence, there exists a $c$ such that $\bar{x} \ge c$,we have $g(\bar) < K*$. See the figure.

So, the critical region of the test is of form $\bar{X} \ge c$, for some $c$ such that,

$P_{H_o}(\bar{X} \ge c)=\alpha$, for some $0 \le \alpha \le 1$, where $\alpha$ is the size of the test.

Now, our task is to find $c$, and for that observe, if $X \sim Exponential(\theta)$, then $\frac{2X}{\theta} \sim {\chi^2}_2$,

Hence, in this problem, since the $X_i$’s follows $Exponential(\mu)$, hence, $\frac{2n\bar{X}}{\mu} \sim {\chi^2}_{2n}$, we have,

$P_{H_o}(\bar{X} \ge c)=\alpha \\ P_{H_o}(\frac{2n\bar{X}}{\mu_o} \ge \frac{2nc}{\mu_o})=\alpha \\ P_{H_o}({\chi^2}{2n} \ge \frac{2nc}{\mu_o})=\alpha$,

which gives $c=\frac{\mu_o {\chi^2}_{2n;1-\alpha}}{2n}$,

Hence, the rejection region is indeed, $[\bar{X} \ge \frac{\mu_o {\chi^2}_{2n;1-\alpha}}{2n}$.

Hence Proved !

(b) Now, we know that the power of the test is,

$\beta= E_{\mu}(\phi) \\ = P_{\mu}(\lambda(\bar{x})>k)=P(\bar{X} \ge \frac{\mu_o {\chi_{2n;1-\alpha}}^2}{2n}) \\ \beta = P_{\mu}({\chi^2}_{2n} \ge \frac{mu_o}{\mu}{\chi^2}_{2n;1-\alpha})$.

Hence, the power of the test is of form of a cdf of chi-squared distribution.

## Food For Thought

Can you use any other testing procedure to conduct this test ?

Categories

## ISI MStat PSB 2013 Problem 7 | Bernoulli interferes Normally

This is a very simple and beautiful sample problem from ISI MStat PSB 2013 Problem 7. It is mainly based on simple hypothesis testing of normal variables where it is just modified with a bernoulli random variable. Try it!

## Problem– ISI MStat PSB 2013 Problem 7

Suppose $X_1$ and $X_2$ are two independent and identically distributed random variables with $N(\theta, 1)$. Further consider a Bernoulli random variable $V$ with $P(V=1)=\frac{1}{4}$ which is independent of $X_1$ and $X_2$ . Define $X_3$ as,

$X_3 = \begin{cases} X_1 & when & V=0 \\ X_2 & when & V=1 \end{cases}$

For testing $H_o: \theta= 0$ against $H_1=\theta=1$ consider the test:

Rejects $H_o$ if $\frac{(X_1+X_2+X_3)}{3} >c$.

Find $c$ such that the test has size $0.05$.

### Prerequisites

Normal Distribution

Simple Hypothesis Testing

Bernoulli Trials

## Solution :

These problem is simple enough, the only trick is that to observe that the test rule is based on 3 random variables, $X_1,X_2$ and $X_3$ but $X_3$ on extension is dependent on the the other bernoulli variable $V$.

So, here it is given that we reject $H_o$ at size $0.05$ if $\frac{(X_1+X_2+X_3)}{3}> c$ such that,

$P_{H_o}(\frac{X_1+X_2+X_3}{3}>c)=0.05$

So, Using law of Total Probability as, $X_3$ is conditioned on $V$,

$P_{H_o}(X_1+X_2+X_3>3c|V=0)P(V=0)+P_{H_o}(X_1+X_2+X_3>3c|V=1)P(V=1)=0.05$

$\Rightarrow P_{H_o}(2X_1+X_2>3c)\frac{3}{4}+P_{H_o}(X_1+2X_2>3c)\frac{1}{4}=0.05$ [ remember, $X_1$, and $X_2$ are independent of $V$].

Now, under $H_o$ , $2X_1+X_2 \sim N(0,5)$and $X_1+2X_2 \sim N(0,5)$ ,

So, the rest part is quite obvious and easy to figure it out which I leave it is an exercise itself !!

## Food For Thought

Lets end this discussion with some exponential,

Suppose, $X_1,X_2,….,X_n$ are a random sample from $exponential(\theta)$ and $Y_1,Y_2,…..,Y_m$ is another random sample from the population of $exponential(\mu)$. Now you are to test $H_o: \theta=\mu$ against $H_1: \theta \neq \mu$ .

Can you show that the test can be based on a statistic $T$ such that, $T= \frac{\sum X_i}{\sum X_i +\sum Y_i}$.

What distribution you think, T should follow under null hypothesis ? Think it over !!

Categories

## ISI MStat PSB 2013 Problem 4 | Linear Regression

This is a sample problem from ISI MStat PSB 2013 Problem 4. It is based on the simple linear regression model, finding the estimates, and MSEs. But think over the “Food for Thought” any kind of discussion will be appreciated. Give it a try!

## Problem– ISI MStat PSB 2013 Problem 4

Consider n independent observation ${ (x_i,y_i) :1 \le i \le n}$ from the model

$Y= \alpha + \beta x + \epsilon$ ,

where $\epsilon$ is normal with mean 0 and variance $\sigma^2$ . Let $\hat{\alpha}, \hat{\beta}$ and $\hat{\sigma}^2$ be the maximum likelihood estimators of $\alpha , \beta$ and $\sigma^2$ , respectively. Let $v_{11}, v_{22}$ and $v_{12}$ be the estimated values of $Var(\hat{\alpha}), Var(\hat{\beta}$ and $Cov ( \hat{\alpha}, \hat{\beta})$, respectively.

(a) What is the estimated mean of Y, when when $x=x_o$ ? Estimate the mean squared error of this estimator .

(b) What is the predicted value of Y, when when $x=x_o$ ? Estimate the mean squared error of this predictor .

### Prerequisites

Linear Regression

Method of Least Squares

Maximum likelihood Estimators.

Mean Squared Error.

## Solution :

Here for the given model,

we have , the random errors, $\epsilon \sim n(0, \sigma^2)$, and the maximum likelihood estimators (MLE), of the model parameters are given by $\hat{\alpha}, \hat{\beta}$ and $\hat{\sigma}^2$. The interesting thing about this model is, since the random errors $\epsilon$ are Gaussian Random variables, the Ordinary Least Square Estimates of the model parameters $\alpha, \beta$ and $\sigma^2$, are identical to their Maximum Likelihood Estimators, ( which are already given!). How ?? Verify it yourself and once and remember it henceforth.

So, here $\hat{\alpha}, \hat{\beta}$ and $\hat{\sigma}^2$ there also the OLS estimates of the model parameters respectively.

And By Gauss-Markov Theorem, the OLS estimates of the model parameters are the BLUE (Best Linear Unbiased Estimator), for the model parameters. So, here $\hat{\alpha}, \hat{\beta}$ and $\hat{\sigma}^2$ are also the unbiased estimators of $\alpha, \beta$ and $\sigma^2$ respectively.

(a) Now we need to find the estimated mean Y given $x=x_o$ ,

$\hat{ E( Y| x=x_o)}= \hat{\alpha} + \hat{\beta} x_o$ is the estimated mean of Y given $x=x_o$.

Now since, the given MLEs ( OLSEs) are also unbiased for their respective parameters,

$MSE( \hat{ E( Y| x=x_o)})=MSE(\hat{\alpha} + \hat{\beta} x_o)=E(\hat{\alpha} + \hat{\beta} x_o-(\alpha + \beta x_o))^2$

=$E(\hat{\alpha} + \hat{\beta} x_o-E(\hat{\alpha} + \hat{\beta} x_o))^2$

=$Var( \hat{\alpha} + \hat{\beta} x_o)$

= $Var(\hat{\alpha}+2x_o Cov(\hat{\alpha}, \hat{\beta})+ {x_o}^2Var(\hat{\beta})$

So, $.MSE( \hat{ E( Y| x=x_o)})= v_{11} +2x_o v_{12} + {x_o}^2 {v_{22}}$.

(b) Similarly, when $x=x_o$ , the predicted value of Y would be,

$\hat{Y} = \hat{\alpha} + \hat{\beta} x_o +\epsilon$ is the predicted value of Y when $x=x_o$ is given.

Using similar arguments, as in (a) Properties of independence between the model parameters , verify that,

$MSE(\hat{Y})= v_{11}+ 2x_o v_{12} + {x_o}^2{ v_{22}}+{\hat{\sigma}^2}$. Hence we are done !

## Food For Thought

Now, can you explain Why, the Maximum Likelihood Estimators and Ordinary Least Square Estimates are identical, when the model assumes Gaussian errors ??

Wait!! Not done yet. The main course is served below !!

In a game of dart, a thrower throws a dart randomly and uniformly in a unit circle. Let $\theta$ be the angle between the line segment joining the dart and the center and the horizontal axis, now consider Z be a random variable. When the thrower is lefty , Z=-1 and when the thrower is right-handed , Z=1 . Assume that getting a Left-handed and Right-handed thrower is equally likely ( is it really equally likely, in real scenario ?? ). Can you construct a regression model, for regressing $\theta$ on Z.

Think over it, if you want to discuss, we can do that too !!

Categories

## ISI MStat PSB 2014 Problem 9 | Hypothesis Testing

This is a another beautiful sample problem from ISI MStat PSB 2014 Problem 9. It is based on testing simple hypothesis, but reveals and uses a very cute property of Geometric distribution, which I prefer calling sister to Loss of memory . Give it a try !

## Problem– ISI MStat PSB 2014 Problem 9

Let $X_i \sim Geo(p_1)$ and $X_2 \sim Geo(p_2)$ be independent random variables, where Geo(p) refers to Geometric distribution whose p.m.f. f is given by,

$f(k)=p(1-p)^k, k=0,1,…..$

We are interested in testing the null hypothesis $H_o : p_1=p_2$ against the alternative $H_1: p_1<p_2$. Intuitively it is clear that we should reject if $X_1$ is large, but unfortunately, we cannot compute the cut-off because the distribution of $X_1$ under $H_o$ depends on the unknown (common) value $p_1$ and $p_2$.

(a) Let $Y= X_1 +X_2$. Find the conditional distribution of $X_1|Y=y$ when $p_1=p_2$.

(b) Based on the result obtained in (a), derive a level 0.05 test for $H_o$ against $H_1$ when $X_1$ is large.

### Prerequisites

Geometric Distribution.

Negative binomial distribution.

Discrete Uniform distribution .

Conditional Distribution . .

Simple Hypothesis Testing.

## Solution :

Well, Part (a), is quite easy, but interesting and elegant, so I’m leaving it as an exercise, for you to have the fun. Hint: verify whether the required distribution is Discrete uniform or not ! If you are done, proceed .

Now, part (b), is further interesting, because here we will not use the conventional way of analyzing the distribution of $X_1$ and $X_2$, whereas we will be concentrating ourselves on the conditional distribution of $X_1 | Y=y$ ! But why ?

The reason behind this adaptation of strategy is required, one of the reason is already given in the question itself, but the other reason is more interesting to observe , i.e. if you are done with (a), then by now you found that , the conditional distribution of $X_1|Y=y$ is independent of any parameter ( i.e. ithe distribution of $X_1$ looses all the information about the parameter $p_1$ , when conditioned by Y=y , $p_1=p_2$ is a necessary condition), and the parameter independent conditional distribution is nothing but a Discrete Uniform {0,1,….,y}, where y is the sum of $X_1$ and $X_2$ .

so, under $H_o: p_1=p_2$ , the distribution of $X_1|Y=y$ is independent of the both common parameter $p_1$ and $p_2$ . And clearly as stated in the problem itself, its intuitively understandable , large value of $X_1$ exhibits evidences against $H_o$. Since large value of $X_1$ is realized, means the success doesn’t come very often .i.e. $p_1$ is smaller.

So, there will be strong evidence against $H_o$ if $X_1 > c$ , where , for some constant $c \ge y$, where y is given the sum of $X_1+X_2$.

So, for a level 0.05 test , the test will reject $H_o$ for large value of k , such that,

$P_{H_o}( X_1 > c| Y=y)=0.05 \Rightarrow \frac{y-c}{y+1} = 0.05 \Rightarrow c= 0.95 y – 0.05 .$

So, we reject $H_o$ at level 0.05, when we observe $X_1 > 0.95y – 0.05$ , where it is given that $X_1+X_2$ =y . That’s it!

## Food For Thought

Can you show that for this same $X_1$ and $X_2$ ,

$P(X_1 \le n)- P( X_1+X_2 \le n)= \frac{1-p}{p}P(X_1+X_2= n)$

considering $p_1=p_2=p$ , where n=0,1,…. What about the converse? Does it hold? Find out!

But avoid loosing memory, it’s beauty is exclusively for Geometric ( and exponential) !!

Categories

## ISI MStat PSB 2008 Problem 10 | Hypothesis Testing

This is a really beautiful sample problem from ISI MStat PSB 2008 Problem 10. It is based on testing simple hypothesis. This problem teaches me how observation, makes life simple. Go for it!

## Problem– ISI MStat PSB 2008 Problem 10

Consider a population with three kinds of individuals labelled 1,2 and 3. Suppose the proportion of individuals of the three types are given by $f(k, \theta)$, k=1,2,3 where 0< $\theta$<1.

$f(k, \theta) = \begin{cases} {\theta}^2 & k=1 \\ 2\theta(1-\theta) & k=2 \\ (1-\theta)^2 & k=3 \end{cases}$

Let $X_1,X_2,….,X_n$ be a random sample from this population. Find the most powerful test for testing $H_o : \theta =\theta_o$ versus $H_1: \theta = \theta_1$. ($\theta_o< \theta_1< 1$).

### Prerequisites

Binomial Distribution.

Neyman-Pearson Lemma.

Test function and power function.

Hypothesis Testing.

## Solution :

This is a quite beautiful problem, only when you observe it closely. Here the distribution of X may seem non-trivial ( non-theoretical), but if one observes the distribution of Y=X-1 (say), instead of X , one will find that $Y \sim binomial( 2, 1-\theta)$ .

so, now let, p= 1-$\theta$ , so, 0<p<1, and let, $p_o= 1-\theta_o$ and $p_1=1-\theta_1$.

and since , $\theta_o< \theta_1 so, p_0>p_1$, and our hypotheses, reduces to,

$H_o : p = p_o$ versus $H_1: p = p_1, where 1> p_o> p_1$.

so, under $H_o$ , our joint pmf ( of Y=X-1), is $f_o( \vec{y}) = \prod_{i=1}^n {2 \choose y_i} {(p_o)^{y_i}(1-p_0)^{2-y_i}}$ ; where $y_i=x_i-1 , i=1,…,n$

and under $H_1$, our joint pmf is, $f_o( \vec{y}) = \prod_{i=1}^n{2 \choose y_i}{(p_1)^{y_i}(1-p_1)^{2-y_i}}$ ; where $y_i=x_i-1, i=1,…,n$

So, now we can use, widely used Neyman-Pearson Lemma , and end up with,

$\lambda (\vec{y})$=$\frac{f_1(\vec{y})}{f_o(\vec{y})}$=$\frac{\prod_{i=1}^{n} {2 \choose y_i} {p_1}^{y_i} {(1-p_0)}^{2-y_i}}{\prod_{i=1}^n {2 \choose y_i}{p_1}^{y_i}{(1-p_1)}^{2-y_i}}$=${(\frac{p_1}{p_0})}^{\sum{y_i}} {(\frac{1-p_1}{1-p_o})}^{2-\sum{y_i}}$ .

now we define a test function, $\phi(\vec{x})= \begin{cases} 1& \lambda*(\vec{x})> k \\ 0 &\lambda*(\vec{x}) \le k \end{cases}$. for some positive constant k.

Where $\lambda(\vec{y})=\lambda*(\vec{x}), \vec{x}= ( X_1,….,X_n)$

so, our test rule is, we reject $H_o$ if $\phi(\vec{x})=1$, and we choose k such that the for a give level $\alpha$,

$E_{H_o}(\phi(\vec{x})) \le \alpha$, for a given $0<\alpha<1$,

with a power function , $\beta(\theta)= E(\phi(\vec{x}))$. Can you find the more subtle condition when,$\lambda^*(\vec{x}) \le k$ ? Try It!

## Food For Thought

Suppose, $\theta_o \le \theta_1$, can you verify, that there for any constant c, $P_{\theta_1}(X>c) \le P_{\theta_1}(X>c)$ . Can you generalize the situation, what kind distribution must X follow ?? Think over it, until we meet again !

Categories

## Life Testing Experiment | ISI MStat 2017 PSB Problem 5

This is a problem from the ISI MStat 2017 Entrance Examination and tests how good are your skills in modeling a life testing experiment using an exponential distribution.

## The Problem:

The lifetime in hours of each bulb manufactured by a particular company follows an independent exponential distribution with mean $\lambda$. We need to test the null hypothesis $H_0: \lambda=1000$ against $H_1:\lambda=500$.
A statistician sets up an experiment with $50$ bulbs, with $5$ bulbs in each of $10$ different locations, to examine their lifetimes.

To get quick preliminary results,the statistician decides to stop the experiment as soon as one bulb fails at each location.Let $Y_i$ denote the lifetime of the first bulb to fail at location $i$.Obtain the most powerful test of $H_0$ against $H_1$ based on $Y_1,Y_2,â€¦Y_{10}$ and compute its power.

## Prerequisites:

1.Properties of Exponential/Gamma distribution.

3.Order Statistics.

## Proof:

As it is clear from the arrangement of the bulbs, the first to fail(among 5 in a given location) has the smallest lifetime among the same.

That is, in more mathematical terms, for a location $i$, we can write $Y_i = \text{min}(X_{i1},X_{i2},..,X_{i5})$.

Here, $X_{ij}$ denotes the $j$ th unit in the $i th$ location where $i=1,2,..,10$ and $j=1,2,..,5$

It is given that $X_{ij} \sim \text{Exp}(\lambda)$.

Can you see that $Y_i \sim \text{Exp}(5 \lambda)$? You may try to prove this result for this:

If $X_1,..,X_n$ be a random sample from $\text{Exp}(\lambda)$ distribution,

then $X_{(1)}=\text{min}(X_1,….,X_n) \sim \text{Exp}(n \lambda)$.

So, now we have $Y_1,Y_2,..Y_{10}$ in hand each having $\text{Exp}(5 \lambda)$ distribution.

Let the joint pdf be $f(\mathbf{y} )=\frac{1}{(5 \lambda)^{10}} e^{-\frac{\sum_{i=1}^{10} y_i}{5 \lambda}}$.

For testing $H_0: \lambda=1000$ against $H_1:\lambda=500$, we use the Neyman Pearson Lemma.

We have the critical region of the most powerful test as $\frac{f_{H_1}(\mathbf{y})}{f_{H_0}(\mathbf{y})} >c$

which after simplification comes out to be $\bar{Y} > K$ where $K$ is an appropriate constant.

Also, see that $\bar{Y} \sim \text{Gamma}(10,50 \lambda)$.

Can you use this fact to find the value of $K$ using the size ($\alpha$) criterion ? (Exercise to the reader)

Also, find the power of the test.

## Challenge Problem:

The exponential distribution is used widely to model lifetime of appliances. The following scenario is based on such a model.

Suppose electric bulbs have a lifetime distribution with pdf $f(t)=\lambda e^{-\lambda t}$ where $t \in [0, \infty)$ .

These bulbs are used individually for street lighting in a large number of posts.A bulb is replaced immediately after it burns out.

Let’s break down the problem in steps.

(i)Starting from time $t=0$ , the process is observed till $t=T$.Can you calculate the expected number of replacements in a post during the interval $(0,T)$ ?

(ii) Hence,deduce $g(t) \text{dt}$ ,the probability of a bulb being replaced in $(t,t+ \text{dt})$ for $t < T$,irrespective of when the bulb was put in.

(iii)Next,suppose that at the end of the first interval of time $T$,all bulbs which were put in the posts before time $X < T$ and have not burned out are replaced by new ones,but the bulbs replaced after ttime $X$ continue to be used,provided,of course,that they have not burned out.

Prove that with such a mixture of old and new bulbs, the probability of a bulb having an expected lifetime > $\tau$ in the second interval of length $T$ is given by

$S_2(\tau)=\frac{1}{2}e^{-\lambda \tau}(1+ e^{-\lambda X})$

Also, try proving the general case where the lifetimes of the bulbs follow the pdf $f(t)$ . Here, $f(t)$ need not be the pdf of an exponential distribution .

You should be getting: $S_2(\tau)=(1-p)S_1(\tau) + \int_{0}^{x} g(T-x)S_1(x)S_1(\tau +x) \text{dx}$ ; where $\tau<T$

where, $p$ is the proportion of bulbs not replaced at time $t=T$ and $S_1(t)$ is the probability that a bulb has lifetime > $t$.

Categories

## Neyman Welcomes You | ISI MStat 2018 PSB Problem 8

This is a problem from ISI MStat Examination,2018. It involves the construction of the most powerful test of size alpha using Neyman Pearson Lemma. The aim is to find its critical region in terms of quantiles of a standard distribution.

## Problem

Let $X_1 ,X_2,.. X_n$ be an i.i.d sample from $f(x;\theta) , \theta \in {0,1 }$, with

$f(x;0) = \begin{cases} 1 & \text{if} \ 0<x<1 \\ 0 & \text{otherwise} \\ \end{cases}$

and $f(x,1)= \begin{cases} \frac{1}{2 \sqrt{x}} & \text{if} \ 0<x<1 \\ 0 & \text{otherwise} \\ \end{cases}$
Based on the above sample,obtain the most powerful test for testing $H_0:\theta=0$ against $H_1: \theta=1$, at level $\alpha$, with $0 < \alpha <1$.Find the critical region in terms of quantiles of a standard distribution.

## Prerequisites

1. The Fundamental Neyman Pearson Lemma

2. Useful Transformations of Random Variables

3. Properties of standard probability distributions (e.g. Normal,Chi-squared etc)

All these topics are included in the regular coursework of undergraduate statistics students. If not, one may refer standard texts like Casella Berger.

## Solution

As, $X_1,X_2,..X_n$ is a random sample, they are independent by definition.
So, their joint pdf when $\theta=0$ is given by $f(\textbf{x},0)= 1 . \prod_{i=1}^{n} 1_{0<x_i<1}$, where $1_{0<x_i<1}$ denotes the indicator function over the interval $[0,1]$.

Similarly, the joint pdf when $\theta=1$ is given by:
$f(\textbf{x},1)=\frac{1}{2^n \prod_{i=1}^{n}\sqrt{x_i}} . \prod_{i=1}^{n}1_{0 <x_i<1}$

According to the Fundamental Neyman Pearson Lemma}, the most powerful size $\alpha$ test for testing $H_{0}$ vs $H_{1}$ is given by the test function $\phi$ as follows:

$\phi=\begin{cases} 1 & \text{if} \ \frac{f(\textbf{x},1)}{f(\textbf{x},0)} > k \\ 0 & \text{otherwise} \\ \end{cases}$

where k is such that $E_{H_0}(\phi)=\alpha$.

So, our test criterion is $\frac{f(\textbf{x},1)}{f(\textbf{x},0)} > k$
Plugging in the pdfs, we get the criterion as $\prod_{i=1}^{n} X_i < \frac{1}{2^{2n }k^2} = \lambda$(say)

Our aim now is to find the value of $\lambda$ from the given size $\alpha$ criterion,
Thus,

$P_{H_0}(\prod_{i=1}^{n}X_i < \lambda)=\alpha$

$\iff P_{H_{0}}(\sum_{i=1}^{n} \ln{X_i} < \ln{\lambda}) =\alpha$

$\iff P_{H_{0}}(-2.\sum_{i=1}^{n} \ln{X_i} >-2. \ln{\lambda}) =\alpha$

Now, we state a result: If $X_i \sim U(0,1)$ ,then $-2 \ln{X_i} \sim \chi^2_{2}$ distribution (Prove it yourself!)

As $X_i$’s are independent, due to reproductive property of chi-squared distribution, $-2.\sum_{i=1}^{n} \ln{X_i} \sim \chi^2_{2n}$
Hence , we simply need that value of $\lambda$ such that the quantity $P_{H_0}(\chi^2=-2.\sum_{i=1}^{n} \ln{X_i} > -2 \ln{\lambda})=\alpha$
The obvious choice is $-2 \ln{\lambda} = \chi^2_{2n , \alpha}$ , where $\chi^2_{2n , \alpha}$ is the upper $\alpha$ point of $\chi^2_{2n}$ distribution.

So, we have $-2 \ln{\lambda} = \chi^2_{\alpha,2n}$ implies $\lambda =e^{-\frac{1}{2}\chi^2_{\alpha,2n}}$
So, our critical region for this test is $\prod_{i=1}^{n} X_i < e^{-\frac{1}{2} \chi^2_{\alpha,2n}}$

## Food For Thought

In this problem , look at the supports of the two distributions under the null and alternative hypotheses.
See that both the supports are the same and hence the quantity $\frac{f_1}{f_0}$ is defined everywhere.
But suppose for a problem the two supports are not the same and they are not disjoint then try constructing a most powerful test using the Neyman Pearson Lemma.
For Example:
Let the family of distributions be ${\theta:X \sim U(0,\theta)}$
Find the most powerful test for testing $H_0 : \theta=1$ against $H_1: \theta=2$
Note that the supports under null and alternative hypotheses are not the same in this case.
Give it a try!