Categories

## Testing of Hypothesis | ISI MStat 2016 PSB Problem 9

This is a problem from the ISI MStat Entrance Examination, 2016 involving the basic idea of Type 1 error of Testing of Hypothesis but focussing on the fundamental relationship of Exponential Distribution and the Geometric Distribution.

## The Problem:

Suppose $X_{1}, X_{2}, \ldots, X_{n}$ is a random sample from an exponential distribution with mean $\lambda$.

Assume that the observed data is available on $\left[X_{1}\right], \ldots,\left[X_{n}\right]$, instead of $X_{1}, \ldots, X_{n},$ where $[x]$ denotes the largest integer less than or equal to $x$.

Consider a test for $H_{0}: \lambda=1$ vs $H_{1}: \lambda>1$ which rejects $H_{0}$ when $\sum_{i=1}^{n}\left[X_{i}\right]>c_{n} .$

Given $\alpha \in(0,1),$ obtain values of $c_{n}$ such that the size of the test converges to $\alpha$ as $n \rightarrow \infty$.

## Prerequisites:

(a) Testing of Hypothesis

(b) Type 1 Error

(c) Exponential Distribution

(d) Relationship of Exponential Distribution and Geometric Distribution

(e) Central Limit Theorem

## Solution:

• X ~ Exponential($\lambda$), then $Y = [\frac{X}{a}]$ ~ Geom($p$), where $p = 1-e^{-\lambda a} \in(0,1)$

Proof:

$Y$ is clearly discrete taking values in the set of non-negative integers, due to the flooring. Then, for any integer $n \geq 0$ we have
$\begin{array}{c} P(Y=n)=P(X \in[\text {an, } a(n+1))) \ =\int_{a n}^{a(n+1)} \lambda \mathrm{e}^{-\lambda x} d x=(1-p)^{n} p \end{array}$
where $p=1-e^{-\lambda a} \in(0,1),$ as $\lambda>0$ and $a>0$.

• $X_i$ ~ Geom($p$), then $\sum_{i = 1}^{n}$ ~ NBinom(n,p)
• $X_i$ ~ Exponential($\lambda$), then $S_n = \sum_{i=1}^{n}\left[X_{i}\right]$ ~ NBinom($(n,p)$), where $p = 1-e^{-\lambda} \in(0,1)$

#### Testing of Hypothesis

$H_{0}: \lambda=1$ vs $H_{1}: \lambda>1$

We reject $H_{0}$ when $\sum_{i=1}^{n}\left[X_{i}\right]>c_{n} .$

Here, the size of the test i.e the Type 1 error (for simple hypothesis), $\alpha_n$ = $P(S_n > c_{n} | \lambda=1)$.

We want to select $c_n$ such that $\alpha_n \to \alpha$.

$S_n$ ~ NBinom($n,p$), where $p = 1-e^{-1}$ under $H_0$.

Now, $\frac{\sqrt{n}(\frac{S_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}} \rightarrow Z = N(0,1)$ by Central Limit Theorem.

Observe that thus, $\alpha_n = P(S_n > c_{n} | \lambda=1) \rightarrow P(Z > \frac{\sqrt{n}(\frac{c_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}}) = \alpha$.

Thus, $\frac{\sqrt{n}(\frac{c_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}} = Z_{\alpha}$.

We can solve this to find $c_n$, where $p = 1-e^{-1}$

## Food for Thought

If X ~ Exponential($\lambda$), then what is the distribution of {X} [ The fractional part of X]. This question is crucial is getting back Exponential Distrbution from Geometric Distribution.

Rather, the food for thought, asks you how do we get Exponential Distribution from Geometric Distribution.

Stay Tuned. Stay Blessed! See you in the next post.

Categories

## ISI MStat PSB 2009 Problem 8 | How big is the Mean?

This is a very simple and regular sample problem from ISI MStat PSB 2009 Problem 8. It It is based on testing the nature of the mean of Exponential distribution. Give it a Try it !

## Problem– ISI MStat PSB 2009 Problem 8

Let $X_1,…..,X_n$ be i.i.d. observation from the density,

$f(x)=\frac{1}{\mu}exp(-\frac{x}{\mu}) , x>0$

where $\mu >0$ is an unknown parameter.

Consider the problem of testing the hypothesis $H_o : \mu \le \mu_o$ against $H_1 : \mu > \mu_o$.

(a) Show that the test with critical region $[\bar{X} \ge \mu_o {\chi_{2n,1-\alpha}}^2/2n]$, where ${\chi^2}_{2n,1-\alpha}$ is the $(1-\alpha)$th quantile of the ${\chi^2}_{2n}$ distribution, has size $\alpha$.

(b) Give an expression of the power in terms of the c.d.f. of the ${\chi^2}_{2n}$ distribution.

### Prerequisites

Likelihood Ratio Test

Exponential Distribution

Chi-squared Distribution

## Solution :

This problem is quite regular and simple, from the given form of the hypotheses , it is almost clear that using Neyman-Pearson can land you in trouble. So, lets go for something more general , that is Likelihood Ratio Testing.

Hence, the Likelihood function of the $\mu$ for the given sample is ,

$L(\mu | \vec{X})=(\frac{1}{\mu})^n exp(-\frac{\sum_{i=1}^n X_i}{\mu}) , \mu>0$, also observe that sample mean $\vec{X}$ is the MLE of $\mu$.

So, the Likelihood Ratio statistic is,

$\lambda(\vec{x})=\frac{\sup_{\mu \le \mu_o}L(\mu |\vec{x})}{\sup_\mu L(\mu |\vec{x})} \\ =\begin{cases} 1 & \mu_o \ge \bar{X} \\ \frac{L(\mu_o|\vec{x})}{L(\bar{X}|\vec{x})} & \mu_o < \bar{X} \end{cases}$

So, our test function is ,

$\phi(\vec{x})=\begin{cases} 1 & \lambda(\vec{x})<k \\ 0 & otherwise \end{cases}$.

We, reject $H_o$ at size $\alpha$, when $\phi(\vec{x})=1$, for some $k$, $E_{H_o}(\phi) \le \alpha$,

Hence, $\lambda(\vec{x}) < k \\ \Rightarrow L(\mu_o|\vec{x})<kL(\bar{X}|\vec{x}) \\ \ln k_1 -\frac{1}{\mu_o}\sum_{i=1}^n X_i < \ln k -n \ln \bar{X} -\frac{1}{n} \\ n \ln \bar{X}-\frac{n\bar{X}}{\mu_o} < K*$.

for some constant, $K*$.

Let $g(\bar{x})=n\ln \bar{x} -\frac{n\bar{x}}{\mu_o}$, and observe that $g$ is,

decreasing function of $\bar{x}$ for $\bar{x} \ge \mu_o$,

Hence, there exists a $c$ such that $\bar{x} \ge c$,we have $g(\bar) < K*$. See the figure.

So, the critical region of the test is of form $\bar{X} \ge c$, for some $c$ such that,

$P_{H_o}(\bar{X} \ge c)=\alpha$, for some $0 \le \alpha \le 1$, where $\alpha$ is the size of the test.

Now, our task is to find $c$, and for that observe, if $X \sim Exponential(\theta)$, then $\frac{2X}{\theta} \sim {\chi^2}_2$,

Hence, in this problem, since the $X_i$’s follows $Exponential(\mu)$, hence, $\frac{2n\bar{X}}{\mu} \sim {\chi^2}_{2n}$, we have,

$P_{H_o}(\bar{X} \ge c)=\alpha \\ P_{H_o}(\frac{2n\bar{X}}{\mu_o} \ge \frac{2nc}{\mu_o})=\alpha \\ P_{H_o}({\chi^2}{2n} \ge \frac{2nc}{\mu_o})=\alpha$,

which gives $c=\frac{\mu_o {\chi^2}_{2n;1-\alpha}}{2n}$,

Hence, the rejection region is indeed, $[\bar{X} \ge \frac{\mu_o {\chi^2}_{2n;1-\alpha}}{2n}$.

Hence Proved !

(b) Now, we know that the power of the test is,

$\beta= E_{\mu}(\phi) \\ = P_{\mu}(\lambda(\bar{x})>k)=P(\bar{X} \ge \frac{\mu_o {\chi_{2n;1-\alpha}}^2}{2n}) \\ \beta = P_{\mu}({\chi^2}_{2n} \ge \frac{mu_o}{\mu}{\chi^2}_{2n;1-\alpha})$.

Hence, the power of the test is of form of a cdf of chi-squared distribution.

## Food For Thought

Can you use any other testing procedure to conduct this test ?

Categories

## ISI MStat PSB 2013 Problem 7 | Bernoulli interferes Normally

This is a very simple and beautiful sample problem from ISI MStat PSB 2013 Problem 7. It is mainly based on simple hypothesis testing of normal variables where it is just modified with a bernoulli random variable. Try it!

## Problem– ISI MStat PSB 2013 Problem 7

Suppose $X_1$ and $X_2$ are two independent and identically distributed random variables with $N(\theta, 1)$. Further consider a Bernoulli random variable $V$ with $P(V=1)=\frac{1}{4}$ which is independent of $X_1$ and $X_2$ . Define $X_3$ as,

$X_3 = \begin{cases} X_1 & when & V=0 \\ X_2 & when & V=1 \end{cases}$

For testing $H_o: \theta= 0$ against $H_1=\theta=1$ consider the test:

Rejects $H_o$ if $\frac{(X_1+X_2+X_3)}{3} >c$.

Find $c$ such that the test has size $0.05$.

### Prerequisites

Normal Distribution

Simple Hypothesis Testing

Bernoulli Trials

## Solution :

These problem is simple enough, the only trick is that to observe that the test rule is based on 3 random variables, $X_1,X_2$ and $X_3$ but $X_3$ on extension is dependent on the the other bernoulli variable $V$.

So, here it is given that we reject $H_o$ at size $0.05$ if $\frac{(X_1+X_2+X_3)}{3}> c$ such that,

$P_{H_o}(\frac{X_1+X_2+X_3}{3}>c)=0.05$

So, Using law of Total Probability as, $X_3$ is conditioned on $V$,

$P_{H_o}(X_1+X_2+X_3>3c|V=0)P(V=0)+P_{H_o}(X_1+X_2+X_3>3c|V=1)P(V=1)=0.05$

$\Rightarrow P_{H_o}(2X_1+X_2>3c)\frac{3}{4}+P_{H_o}(X_1+2X_2>3c)\frac{1}{4}=0.05$ [ remember, $X_1$, and $X_2$ are independent of $V$].

Now, under $H_o$ , $2X_1+X_2 \sim N(0,5)$and $X_1+2X_2 \sim N(0,5)$ ,

So, the rest part is quite obvious and easy to figure it out which I leave it is an exercise itself !!

## Food For Thought

Lets end this discussion with some exponential,

Suppose, $X_1,X_2,….,X_n$ are a random sample from $exponential(\theta)$ and $Y_1,Y_2,…..,Y_m$ is another random sample from the population of $exponential(\mu)$. Now you are to test $H_o: \theta=\mu$ against $H_1: \theta \neq \mu$ .

Can you show that the test can be based on a statistic $T$ such that, $T= \frac{\sum X_i}{\sum X_i +\sum Y_i}$.

What distribution you think, T should follow under null hypothesis ? Think it over !!

Categories

## ISI MStat PSB 2014 Problem 9 | Hypothesis Testing

This is a another beautiful sample problem from ISI MStat PSB 2014 Problem 9. It is based on testing simple hypothesis, but reveals and uses a very cute property of Geometric distribution, which I prefer calling sister to Loss of memory . Give it a try !

## Problem– ISI MStat PSB 2014 Problem 9

Let $X_i \sim Geo(p_1)$ and $X_2 \sim Geo(p_2)$ be independent random variables, where Geo(p) refers to Geometric distribution whose p.m.f. f is given by,

$f(k)=p(1-p)^k, k=0,1,…..$

We are interested in testing the null hypothesis $H_o : p_1=p_2$ against the alternative $H_1: p_1<p_2$. Intuitively it is clear that we should reject if $X_1$ is large, but unfortunately, we cannot compute the cut-off because the distribution of $X_1$ under $H_o$ depends on the unknown (common) value $p_1$ and $p_2$.

(a) Let $Y= X_1 +X_2$. Find the conditional distribution of $X_1|Y=y$ when $p_1=p_2$.

(b) Based on the result obtained in (a), derive a level 0.05 test for $H_o$ against $H_1$ when $X_1$ is large.

### Prerequisites

Geometric Distribution.

Negative binomial distribution.

Discrete Uniform distribution .

Conditional Distribution . .

Simple Hypothesis Testing.

## Solution :

Well, Part (a), is quite easy, but interesting and elegant, so I’m leaving it as an exercise, for you to have the fun. Hint: verify whether the required distribution is Discrete uniform or not ! If you are done, proceed .

Now, part (b), is further interesting, because here we will not use the conventional way of analyzing the distribution of $X_1$ and $X_2$, whereas we will be concentrating ourselves on the conditional distribution of $X_1 | Y=y$ ! But why ?

The reason behind this adaptation of strategy is required, one of the reason is already given in the question itself, but the other reason is more interesting to observe , i.e. if you are done with (a), then by now you found that , the conditional distribution of $X_1|Y=y$ is independent of any parameter ( i.e. ithe distribution of $X_1$ looses all the information about the parameter $p_1$ , when conditioned by Y=y , $p_1=p_2$ is a necessary condition), and the parameter independent conditional distribution is nothing but a Discrete Uniform {0,1,….,y}, where y is the sum of $X_1$ and $X_2$ .

so, under $H_o: p_1=p_2$ , the distribution of $X_1|Y=y$ is independent of the both common parameter $p_1$ and $p_2$ . And clearly as stated in the problem itself, its intuitively understandable , large value of $X_1$ exhibits evidences against $H_o$. Since large value of $X_1$ is realized, means the success doesn’t come very often .i.e. $p_1$ is smaller.

So, there will be strong evidence against $H_o$ if $X_1 > c$ , where , for some constant $c \ge y$, where y is given the sum of $X_1+X_2$.

So, for a level 0.05 test , the test will reject $H_o$ for large value of k , such that,

$P_{H_o}( X_1 > c| Y=y)=0.05 \Rightarrow \frac{y-c}{y+1} = 0.05 \Rightarrow c= 0.95 y – 0.05 .$

So, we reject $H_o$ at level 0.05, when we observe $X_1 > 0.95y – 0.05$ , where it is given that $X_1+X_2$ =y . That’s it!

## Food For Thought

Can you show that for this same $X_1$ and $X_2$ ,

$P(X_1 \le n)- P( X_1+X_2 \le n)= \frac{1-p}{p}P(X_1+X_2= n)$

considering $p_1=p_2=p$ , where n=0,1,…. What about the converse? Does it hold? Find out!

But avoid loosing memory, it’s beauty is exclusively for Geometric ( and exponential) !!

Categories

## ISI MStat 2016 Problem 10 | PSB Sample | It’s a piece of cake!

This is a sample problem from ISI MStat 2016 Problem 10, which tests the student’s ability to write a model and then test the equality of parameters in it using appropriate statistics.

## ISI MStat 2016 Problem 10:

A cake weighing one kilogram is cut into two pieces, and each piece is weighed separately. Denote the measured weights of the two pieces by $X$ and $Y$ . Assume that the errors in obtaining $X$ and $Y$ are independent and normally distributed with mean zero and the same (unknown) variance. Devise a test for the hypothesis that the true weights of the two pieces are equal.

## Prerequisites:

1.Testing of Hypothesis

2.Model formation

## Solution:

Let us write the two cases in the form of a model:

$X= \mu_1 + \epsilon_1$

$Y = \mu_2 + \epsilon_2$

where, $\mu_1,\mu_2$ are the true weights of the two slices and $\epsilon_1 , \epsilon_2 \sim N(0, \sigma^2)$ (independently).

So, you get $X \sim N(\mu_1,\sigma^2)$ and $Y \sim N(\mu_2, \sigma^2 )$.

Also, see that $X,Y$ are independent.

So, we need to test $H_0: \mu_1=\mu_2 =\frac{1}{2}$ against $H_1: \mu_1 \neq \mu_2$.

See that, under $H_0$, $X-Y \sim N(0,2 \sigma^2)$

So, $\frac{X-Y}{\sqrt{2} \sigma} \sim N(0,1)$.

But have you noticed that $\sigma$ is unknown? So this isn’t a statistic after all.

Can you replace $\sigma$ by an appropriate quantity so that you can conduct the test?

## Food For Thought:

Okay, let’s move from cakes to doughnuts!!

Yeah, I know this is off topic and nothing related to statistics but it’s good for the brain to alter cuisines once a while!

This is the famous doughnut slicing problem:

What is the largest number of pieces you can slice a doughnut into using only 3 cuts? (Note that you can only make planar cuts and you are not allowed to rearrange the pieces between the cuts)

I would request you to try this on your own without looking up solutions directly.

Categories

## Size, Power, and Condition | ISI MStat 2019 PSB Problem 9

This is a problem from the ISI MStat Entrance Examination, 2019. This primarily tests one’s familiarity with size, power of a test and whether he/she is able to condition an event properly.

## The Problem:

Let Z be a random variable with probability density function

$f(z)=\frac{1}{2} e^{-|z- \mu|} , z \in \mathbb{R}$

with parameter $\mu \in \mathbb{R}$. Suppose, we observe $X =$ max $(0,Z)$.

(a)Find the constant c such that the test that “rejects when $X>c$” has size 0.05 for the null hypothesis $H_0 : \mu=0$.

(b)Find the power of this test against the alternative hypothesis $H_1: \mu =2$.

## Prerequisites:

• A thorough knowledge about the size and power of a test
• Having a good sense of conditioning whenever a function (like max()) is defined piecewise.

And believe me as Joe Blitzstein says: “Conditioning is the soul of statistics”

## Solution:

(a) If you know what size of a test means, then you can easily write down the condition mentioned in part(a) in mathematical terms.

It simply means $P_{H_0}(X>c)=0.05$

Now, under $H_0$, $\mu=0$.

So, we have the pdf of Z as $f(z)=\frac{1}{2} e^{-|z|}$

As the support of Z is $\mathbb{R}$, we can partition it in $\{Z \ge 0,Z <0 \}$.

Now, let’s condition based on this partition. So, we have:

$P_{H_0}(X > c)=P_{H_0}(X>c , Z \ge 0)+ P_{H_0}(X>c, Z<0) =P_{H_0}(X>c , Z \ge 0) =P_{H_0}(Z > c)$

Do, you understand the last equality? (Try to convince yourself why)

So, $P_{H_0}(X >c)=P_{H_0}(Z > c)=\int_{c}^{\infty} \frac{1}{2} e^{-|z|} dz = \frac{1}{2}e^{-c}$

Equating $\frac{1}{2}e^{-c}$ with 0.05, we get $c= \ln{10}$

(b) The second part is just mere calculation given already you know the value of c.

Power of test against $H_1$ is given by:

$P_{H_1}(X>\ln{10})=P_{H_1}(Z > \ln{10})=\int_{\ln{10}}^{\infty} \frac{1}{2} e^{-|z-2|} dz = \frac{e^2}{20}$

## Try out this one:

The pdf occurring in this problem is an example of a Laplace distribution.Look it up on the internet if you are not aware and go through its properties.

Suppose you have a random variable V which follows Exponential Distribution with mean 1.

Let I be a Bernoulli($\frac{1}{2}$) random variable. It is given that I,V are independent.

Can you find a function h (which is also a random variable), $h=h(I,V)$ ( a continuous function of I and V) such that h has the standard Laplace distribution?

Categories

## Neyman Welcomes You | ISI MStat 2018 PSB Problem 8

This is a problem from ISI MStat Examination,2018. It involves the construction of the most powerful test of size alpha using Neyman Pearson Lemma. The aim is to find its critical region in terms of quantiles of a standard distribution.

## Problem

Let $X_1 ,X_2,.. X_n$ be an i.i.d sample from $f(x;\theta) , \theta \in {0,1 }$, with

$f(x;0) = \begin{cases} 1 & \text{if} \ 0<x<1 \\ 0 & \text{otherwise} \\ \end{cases}$

and $f(x,1)= \begin{cases} \frac{1}{2 \sqrt{x}} & \text{if} \ 0<x<1 \\ 0 & \text{otherwise} \\ \end{cases}$
Based on the above sample,obtain the most powerful test for testing $H_0:\theta=0$ against $H_1: \theta=1$, at level $\alpha$, with $0 < \alpha <1$.Find the critical region in terms of quantiles of a standard distribution.

## Prerequisites

1. The Fundamental Neyman Pearson Lemma

2. Useful Transformations of Random Variables

3. Properties of standard probability distributions (e.g. Normal,Chi-squared etc)

All these topics are included in the regular coursework of undergraduate statistics students. If not, one may refer standard texts like Casella Berger.

## Solution

As, $X_1,X_2,..X_n$ is a random sample, they are independent by definition.
So, their joint pdf when $\theta=0$ is given by $f(\textbf{x},0)= 1 . \prod_{i=1}^{n} 1_{0<x_i<1}$, where $1_{0<x_i<1}$ denotes the indicator function over the interval $[0,1]$.

Similarly, the joint pdf when $\theta=1$ is given by:
$f(\textbf{x},1)=\frac{1}{2^n \prod_{i=1}^{n}\sqrt{x_i}} . \prod_{i=1}^{n}1_{0 <x_i<1}$

According to the Fundamental Neyman Pearson Lemma}, the most powerful size $\alpha$ test for testing $H_{0}$ vs $H_{1}$ is given by the test function $\phi$ as follows:

$\phi=\begin{cases} 1 & \text{if} \ \frac{f(\textbf{x},1)}{f(\textbf{x},0)} > k \\ 0 & \text{otherwise} \\ \end{cases}$

where k is such that $E_{H_0}(\phi)=\alpha$.

So, our test criterion is $\frac{f(\textbf{x},1)}{f(\textbf{x},0)} > k$
Plugging in the pdfs, we get the criterion as $\prod_{i=1}^{n} X_i < \frac{1}{2^{2n }k^2} = \lambda$(say)

Our aim now is to find the value of $\lambda$ from the given size $\alpha$ criterion,
Thus,

$P_{H_0}(\prod_{i=1}^{n}X_i < \lambda)=\alpha$

$\iff P_{H_{0}}(\sum_{i=1}^{n} \ln{X_i} < \ln{\lambda}) =\alpha$

$\iff P_{H_{0}}(-2.\sum_{i=1}^{n} \ln{X_i} >-2. \ln{\lambda}) =\alpha$

Now, we state a result: If $X_i \sim U(0,1)$ ,then $-2 \ln{X_i} \sim \chi^2_{2}$ distribution (Prove it yourself!)

As $X_i$’s are independent, due to reproductive property of chi-squared distribution, $-2.\sum_{i=1}^{n} \ln{X_i} \sim \chi^2_{2n}$
Hence , we simply need that value of $\lambda$ such that the quantity $P_{H_0}(\chi^2=-2.\sum_{i=1}^{n} \ln{X_i} > -2 \ln{\lambda})=\alpha$
The obvious choice is $-2 \ln{\lambda} = \chi^2_{2n , \alpha}$ , where $\chi^2_{2n , \alpha}$ is the upper $\alpha$ point of $\chi^2_{2n}$ distribution.

So, we have $-2 \ln{\lambda} = \chi^2_{\alpha,2n}$ implies $\lambda =e^{-\frac{1}{2}\chi^2_{\alpha,2n}}$
So, our critical region for this test is $\prod_{i=1}^{n} X_i < e^{-\frac{1}{2} \chi^2_{\alpha,2n}}$

## Food For Thought

In this problem , look at the supports of the two distributions under the null and alternative hypotheses.
See that both the supports are the same and hence the quantity $\frac{f_1}{f_0}$ is defined everywhere.
But suppose for a problem the two supports are not the same and they are not disjoint then try constructing a most powerful test using the Neyman Pearson Lemma.
For Example:
Let the family of distributions be ${\theta:X \sim U(0,\theta)}$
Find the most powerful test for testing $H_0 : \theta=1$ against $H_1: \theta=2$
Note that the supports under null and alternative hypotheses are not the same in this case.
Give it a try!