Select Page

This is a another beautiful sample problem from ISI MStat PSB 2014 Problem 9. It is based on testing simple hypothesis, but reveals and uses a very cute property of Geometric distribution, which I prefer calling sister to Loss of memory . Give it a try !

## Problem– ISI MStat PSB 2014 Problem 9

Let $X_i \sim Geo(p_1)$ and $X_2 \sim Geo(p_2)$ be independent random variables, where Geo(p) refers to Geometric distribution whose p.m.f. f is given by,

$f(k)=p(1-p)^k, k=0,1,…..$

We are interested in testing the null hypothesis $H_o : p_1=p_2$ against the alternative $H_1: p_1<p_2$. Intuitively it is clear that we should reject if $X_1$ is large, but unfortunately, we cannot compute the cut-off because the distribution of $X_1$ under $H_o$ depends on the unknown (common) value $p_1$ and $p_2$.

(a) Let $Y= X_1 +X_2$. Find the conditional distribution of $X_1|Y=y$ when $p_1=p_2$.

(b) Based on the result obtained in (a), derive a level 0.05 test for $H_o$ against $H_1$ when $X_1$ is large.

### Prerequisites

Geometric Distribution.

Negative binomial distribution.

Discrete Uniform distribution .

Conditional Distribution . .

Simple Hypothesis Testing.

## Solution :

Well, Part (a), is quite easy, but interesting and elegant, so I’m leaving it as an exercise, for you to have the fun. Hint: verify whether the required distribution is Discrete uniform or not ! If you are done, proceed .

Now, part (b), is further interesting, because here we will not use the conventional way of analyzing the distribution of $X_1$ and $X_2$, whereas we will be concentrating ourselves on the conditional distribution of $X_1 | Y=y$ ! But why ?

The reason behind this adaptation of strategy is required, one of the reason is already given in the question itself, but the other reason is more interesting to observe , i.e. if you are done with (a), then by now you found that , the conditional distribution of $X_1|Y=y$ is independent of any parameter ( i.e. ithe distribution of $X_1$ looses all the information about the parameter $p_1$ , when conditioned by Y=y , $p_1=p_2$ is a necessary condition), and the parameter independent conditional distribution is nothing but a Discrete Uniform {0,1,….,y}, where y is the sum of $X_1$ and $X_2$ .

so, under $H_o: p_1=p_2$ , the distribution of $X_1|Y=y$ is independent of the both common parameter $p_1$ and $p_2$ . And clearly as stated in the problem itself, its intuitively understandable , large value of $X_1$ exhibits evidences against $H_o$. Since large value of $X_1$ is realized, means the success doesn’t come very often .i.e. $p_1$ is smaller.

So, there will be strong evidence against $H_o$ if $X_1 > c$ , where , for some constant $c \ge y$, where y is given the sum of $X_1+X_2$.

So, for a level 0.05 test , the test will reject $H_o$ for large value of k , such that,

$P_{H_o}( X_1 > c| Y=y)=0.05 \Rightarrow \frac{y-c}{y+1} = 0.05 \Rightarrow c= 0.95 y – 0.05 .$

So, we reject $H_o$ at level 0.05, when we observe $X_1 > 0.95y – 0.05$ , where it is given that $X_1+X_2$ =y . That’s it!

## Food For Thought

Can you show that for this same $X_1$ and $X_2$ ,

$P(X_1 \le n)- P( X_1+X_2 \le n)= \frac{1-p}{p}P(X_1+X_2= n)$

considering $p_1=p_2=p$ , where n=0,1,…. What about the converse? Does it hold? Find out!

But avoid loosing memory, it’s beauty is exclusively for Geometric ( and exponential) !!