# ISI MStat PSB 2014 Problem 9 | Hypothesis Testing

This is a another beautiful sample problem from ISI MStat PSB 2014 Problem 9. It is based on testing simple hypothesis, but reveals and uses a very cute property of Geometric distribution, which I prefer calling sister to Loss of memory . Give it a try !

**Problem**- ISI MStat PSB 2014 Problem 9

Let \( X_i \sim Geo(p_1)\) and \( X_2 \sim Geo(p_2)\) be independent random variables, where Geo(p) refers to Geometric distribution whose p.m.f. f is given by,

\(f(k)=p(1-p)^k, k=0,1,.....\)

We are interested in testing the null hypothesis \(H_o : p_1=p_2\) against the alternative \( H_1: p_1<p_2\). Intuitively it is clear that we should reject if \(X_1\) is large, but unfortunately, we cannot compute the cut-off because the distribution of \(X_1\) under \(H_o\) depends on the unknown (common) value \(p_1\) and \(p_2\).

(a) Let \(Y= X_1 +X_2\). Find the conditional distribution of \( X_1|Y=y\) when \(p_1=p_2\).

(b) Based on the result obtained in (a), derive a level 0.05 test for \(H_o\) against \(H_1\) when \(X_1\) is large.

**Prerequisites**

Geometric Distribution.

Negative binomial distribution.

Discrete Uniform distribution .

Conditional Distribution . .

Simple Hypothesis Testing.

## Solution :

Well, Part (a), is quite easy, but interesting and elegant, so I'm leaving it as an exercise, for you to have the fun. Hint: verify whether the required distribution is Discrete uniform or not ! If you are done, proceed .

Now, part (b), is further interesting, because here we will not use the conventional way of analyzing the distribution of \(X_1\) and \( X_2\), whereas we will be concentrating ourselves on the conditional distribution of \( X_1 | Y=y\) ! But why ?

The reason behind this adaptation of strategy is required, one of the reason is already given in the question itself, but the other reason is more interesting to observe , i.e. if you are done with (a), then by now you found that , the conditional distribution of \(X_1|Y=y\) is independent of any parameter ( i.e. ithe distribution of \(X_1\) looses all the information about the parameter \(p_1\) , when conditioned by Y=y , \(p_1=p_2\) is a necessary condition), and the parameter independent conditional distribution is nothing but a Discrete Uniform {0,1,....,y}, where y is the sum of \(X_1 \) and \(X_2\) .

so, under \(H_o: p_1=p_2\) , the distribution of \(X_1|Y=y\) is independent of the both common parameter \(p_1 \) and \(p_2\) . And clearly as stated in the problem itself, its intuitively understandable , large value of \(X_1\) exhibits evidences against \(H_o\). Since large value of \(X_1\) is realized, means the success doesn't come very often .i.e. \(p_1\) is smaller.

So, there will be strong evidence against \(H_o\) if \(X_1 > c\) , where , for some constant \(c \ge y\), where y is given the sum of \(X_1+X_2\).

So, for a level 0.05 test , the test will reject \(H_o\) for large value of k , such that,

\( P_{H_o}( X_1 > c| Y=y)=0.05 \Rightarrow \frac{y-c}{y+1} = 0.05 \Rightarrow c= 0.95 y - 0.05 .\)

So, we reject \(H_o\) at level 0.05, when we observe \( X_1 > 0.95y - 0.05 \) , where it is given that \(X_1+X_2\) =y . That's it!

## Food For Thought

Can you show that for this same \(X_1 \) and \( X_2\) ,

\(P(X_1 \le n)- P( X_1+X_2 \le n)= \frac{1-p}{p}P(X_1+X_2= n) \)

considering \(p_1=p_2=p\) , where n=0,1,.... What about the converse? Does it hold? Find out!

But avoid loosing memory, it's beauty is exclusively for Geometric ( and exponential) !!