# ISI MStat PSB 2008 Problem 10 | Hypothesis Testing

This is a really beautiful sample problem from ISI MStat PSB 2008 Problem 10. It is based on testing simple hypothesis. This problem teaches me how observation, makes life simple. Go for it!

## Problem- ISI MStat PSB 2008 Problem 10

Consider a population with three kinds of individuals labelled 1,2 and 3. Suppose the proportion of individuals of the three types are given by $$f(k, \theta)$$, k=1,2,3 where 0< $$\theta$$<1.

$$f(k, \theta) = \begin{cases} {\theta}^2 & k=1 \\ 2\theta(1-\theta) & k=2 \\ (1-\theta)^2 & k=3 \end{cases}$$

Let $$X_1,X_2,....,X_n$$ be a random sample from this population. Find the most powerful test for testing $$H_o : \theta =\theta_o$$ versus $$H_1: \theta = \theta_1$$. ($$\theta_o< \theta_1< 1$$).

### Prerequisites

Binomial Distribution.

Neyman-Pearson Lemma.

Test function and power function.

Hypothesis Testing.

## Solution :

This is a quite beautiful problem, only when you observe it closely. Here the distribution of X may seem non-trivial ( non-theoretical), but if one observes the distribution of Y=X-1 (say), instead of X , one will find that $$Y \sim binomial( 2, 1-\theta)$$ .

so, now let, p= 1-$$\theta$$ , so, 0<p<1, and let, $$p_o= 1-\theta_o$$ and $$p_1=1-\theta_1$$.

and since , $$\theta_o< \theta_1 so, p_0>p_1$$, and our hypotheses, reduces to,

$$H_o : p = p_o$$ versus $$H_1: p = p_1, where 1> p_o> p_1$$.

so, under $$H_o$$ , our joint pmf ( of Y=X-1), is $$f_o( \vec{y}) = \prod_{i=1}^n {2 \choose y_i} {(p_o)^{y_i}(1-p_0)^{2-y_i}}$$ ; where $$y_i=x_i-1 , i=1,...,n$$

and under $$H_1$$, our joint pmf is, $$f_o( \vec{y}) = \prod_{i=1}^n{2 \choose y_i}{(p_1)^{y_i}(1-p_1)^{2-y_i}}$$ ; where $$y_i=x_i-1, i=1,...,n$$

So, now we can use, widely used Neyman-Pearson Lemma , and end up with,

$$\lambda (\vec{y})$$=$$\frac{f_1(\vec{y})}{f_o(\vec{y})}$$=$$\frac{\prod_{i=1}^{n} {2 \choose y_i} {p_1}^{y_i} {(1-p_0)}^{2-y_i}}{\prod_{i=1}^n {2 \choose y_i}{p_1}^{y_i}{(1-p_1)}^{2-y_i}}$$=$${(\frac{p_1}{p_0})}^{\sum{y_i}} {(\frac{1-p_1}{1-p_o})}^{2-\sum{y_i}}$$ .

now we define a test function, $$\phi(\vec{x})= \begin{cases} 1& \lambda*(\vec{x})> k \\ 0 &\lambda*(\vec{x}) \le k \end{cases}$$. for some positive constant k.

Where $$\lambda(\vec{y})=\lambda*(\vec{x}), \vec{x}= ( X_1,....,X_n)$$

so, our test rule is, we reject $$H_o$$ if $$\phi(\vec{x})=1$$, and we choose k such that the for a give level $$\alpha$$,

$$E_{H_o}(\phi(\vec{x})) \le \alpha$$, for a given $$0<\alpha<1$$,

with a power function , $$\beta(\theta)= E(\phi(\vec{x}))$$. Can you find the more subtle condition when,$$\lambda^*(\vec{x}) \le k$$ ? Try It!

## Food For Thought

Suppose, $$\theta_o \le \theta_1$$, can you verify, that there for any constant c, $$P_{\theta_1}(X>c) \le P_{\theta_1}(X>c)$$ . Can you generalize the situation, what kind distribution must X follow ?? Think over it, until we meet again !

# Life Testing Experiment | ISI MStat 2017 PSB Problem 5

This is a problem from the ISI MStat 2017 Entrance Examination and tests how good are your skills in modeling a life testing experiment using an exponential distribution.

## The Problem:

The lifetime in hours of each bulb manufactured by a particular company follows an independent exponential distribution with mean $$\lambda$$. We need to test the null hypothesis $$H_0: \lambda=1000$$ against $$H_1:\lambda=500$$.
A statistician sets up an experiment with $$50$$ bulbs, with $$5$$ bulbs in each of $$10$$ different locations, to examine their lifetimes.

To get quick preliminary results,the statistician decides to stop the experiment as soon as one bulb fails at each location.Let $$Y_i$$ denote the lifetime of the first bulb to fail at location $$i$$.Obtain the most powerful test of $$H_0$$ against $$H_1$$ based on $$Y_1,Y_2,…Y_{10}$$ and compute its power.

## Prerequisites:

1.Properties of Exponential/Gamma distribution.

3.Order Statistics.

## Proof:

As it is clear from the arrangement of the bulbs, the first to fail(among 5 in a given location) has the smallest lifetime among the same.

That is, in more mathematical terms, for a location $$i$$, we can write $$Y_i = \text{min}(X_{i1},X_{i2},..,X_{i5})$$.

Here, $$X_{ij}$$ denotes the $$j$$ th unit in the $$i th$$ location where $$i=1,2,..,10$$ and $$j=1,2,..,5$$

It is given that $$X_{ij} \sim \text{Exp}(\lambda)$$.

Can you see that $$Y_i \sim \text{Exp}(5 \lambda)$$? You may try to prove this result for this:

If $$X_1,..,X_n$$ be a random sample from $$\text{Exp}(\lambda)$$ distribution,

then $$X_{(1)}=\text{min}(X_1,....,X_n) \sim \text{Exp}(n \lambda)$$.

So, now we have $$Y_1,Y_2,..Y_{10}$$ in hand each having $$\text{Exp}(5 \lambda)$$ distribution.

Let the joint pdf be $$f(\mathbf{y} )=\frac{1}{(5 \lambda)^{10}} e^{-\frac{\sum_{i=1}^{10} y_i}{5 \lambda}}$$.

For testing $$H_0: \lambda=1000$$ against $$H_1:\lambda=500$$, we use the Neyman Pearson Lemma.

We have the critical region of the most powerful test as $$\frac{f_{H_1}(\mathbf{y})}{f_{H_0}(\mathbf{y})} >c$$

which after simplification comes out to be $$\bar{Y} > K$$ where $$K$$ is an appropriate constant.

Also, see that $$\bar{Y} \sim \text{Gamma}(10,50 \lambda)$$.

Can you use this fact to find the value of $$K$$ using the size ($$\alpha$$) criterion ? (Exercise to the reader)

Also, find the power of the test.

## Challenge Problem:

The exponential distribution is used widely to model lifetime of appliances. The following scenario is based on such a model.

Suppose electric bulbs have a lifetime distribution with pdf $$f(t)=\lambda e^{-\lambda t}$$ where $$t \in [0, \infty)$$ .

These bulbs are used individually for street lighting in a large number of posts.A bulb is replaced immediately after it burns out.

Let's break down the problem in steps.

(i)Starting from time $$t=0$$ , the process is observed till $$t=T$$.Can you calculate the expected number of replacements in a post during the interval $$(0,T)$$ ?

(ii) Hence,deduce $$g(t) \text{dt}$$ ,the probability of a bulb being replaced in $$(t,t+ \text{dt})$$ for $$t < T$$,irrespective of when the bulb was put in.

(iii)Next,suppose that at the end of the first interval of time $$T$$,all bulbs which were put in the posts before time $$X < T$$ and have not burned out are replaced by new ones,but the bulbs replaced after ttime $$X$$ continue to be used,provided,of course,that they have not burned out.

Prove that with such a mixture of old and new bulbs, the probability of a bulb having an expected lifetime > $$\tau$$ in the second interval of length $$T$$ is given by

$$S_2(\tau)=\frac{1}{2}e^{-\lambda \tau}(1+ e^{-\lambda X})$$

Also, try proving the general case where the lifetimes of the bulbs follow the pdf $$f(t)$$ . Here, $$f(t)$$ need not be the pdf of an exponential distribution .

You should be getting: $$S_2(\tau)=(1-p)S_1(\tau) + \int_{0}^{x} g(T-x)S_1(x)S_1(\tau +x) \text{dx}$$ ; where $$\tau<T$$

where, $$p$$ is the proportion of bulbs not replaced at time $$t=T$$ and $$S_1(t)$$ is the probability that a bulb has lifetime > $$t$$.

# Neyman Welcomes You | ISI MStat 2018 PSB Problem 8

This is a problem from ISI MStat Examination,2018. It involves the construction of the most powerful test of size alpha using Neyman Pearson Lemma. The aim is to find its critical region in terms of quantiles of a standard distribution.

## Problem

Let $$X_1 ,X_2,.. X_n$$ be an i.i.d sample from $$f(x;\theta) , \theta \in {0,1 }$$, with

$$f(x;0) = \begin{cases} 1 & \text{if} \ 0<x<1 \\ 0 & \text{otherwise} \\ \end{cases}$$

and $$f(x,1)= \begin{cases} \frac{1}{2 \sqrt{x}} & \text{if} \ 0<x<1 \\ 0 & \text{otherwise} \\ \end{cases}$$
Based on the above sample,obtain the most powerful test for testing $$H_0:\theta=0$$ against $$H_1: \theta=1$$, at level $$\alpha$$, with $$0 < \alpha <1$$.Find the critical region in terms of quantiles of a standard distribution.

## Prerequisites

1. The Fundamental Neyman Pearson Lemma

2. Useful Transformations of Random Variables

3. Properties of standard probability distributions (e.g. Normal,Chi-squared etc)

All these topics are included in the regular coursework of undergraduate statistics students. If not, one may refer standard texts like Casella Berger.

## Solution

As, $$X_1,X_2,..X_n$$ is a random sample, they are independent by definition.
So, their joint pdf when $$\theta=0$$ is given by $$f(\textbf{x},0)= 1 . \prod_{i=1}^{n} 1_{0<x_i<1}$$, where $$1_{0<x_i<1}$$ denotes the indicator function over the interval $$[0,1]$$.

Similarly, the joint pdf when $$\theta=1$$ is given by:
$$f(\textbf{x},1)=\frac{1}{2^n \prod_{i=1}^{n}\sqrt{x_i}} . \prod_{i=1}^{n}1_{0 <x_i<1}$$

According to the Fundamental Neyman Pearson Lemma}, the most powerful size $$\alpha$$ test for testing $$H_{0}$$ vs $$H_{1}$$ is given by the test function $$\phi$$ as follows:

$$\phi=\begin{cases} 1 & \text{if} \ \frac{f(\textbf{x},1)}{f(\textbf{x},0)} > k \\ 0 & \text{otherwise} \\ \end{cases}$$

where k is such that $$E_{H_0}(\phi)=\alpha$$.

So, our test criterion is $$\frac{f(\textbf{x},1)}{f(\textbf{x},0)} > k$$
Plugging in the pdfs, we get the criterion as $$\prod_{i=1}^{n} X_i < \frac{1}{2^{2n }k^2} = \lambda$$(say)

Our aim now is to find the value of $$\lambda$$ from the given size $$\alpha$$ criterion,
Thus,

$$P_{H_0}(\prod_{i=1}^{n}X_i < \lambda)=\alpha$$

$$\iff P_{H_{0}}(\sum_{i=1}^{n} \ln{X_i} < \ln{\lambda}) =\alpha$$

$$\iff P_{H_{0}}(-2.\sum_{i=1}^{n} \ln{X_i} >-2. \ln{\lambda}) =\alpha$$

Now, we state a result: If $$X_i \sim U(0,1)$$ ,then $$-2 \ln{X_i} \sim \chi^2_{2}$$ distribution (Prove it yourself!)

As $$X_i$$'s are independent, due to reproductive property of chi-squared distribution, $$-2.\sum_{i=1}^{n} \ln{X_i} \sim \chi^2_{2n}$$
Hence , we simply need that value of $$\lambda$$ such that the quantity $$P_{H_0}(\chi^2=-2.\sum_{i=1}^{n} \ln{X_i} > -2 \ln{\lambda})=\alpha$$
The obvious choice is $$-2 \ln{\lambda} = \chi^2_{2n , \alpha}$$ , where $$\chi^2_{2n , \alpha}$$ is the upper $$\alpha$$ point of $$\chi^2_{2n}$$ distribution.

So, we have $$-2 \ln{\lambda} = \chi^2_{\alpha,2n}$$ implies $$\lambda =e^{-\frac{1}{2}\chi^2_{\alpha,2n}}$$
So, our critical region for this test is $$\prod_{i=1}^{n} X_i < e^{-\frac{1}{2} \chi^2_{\alpha,2n}}$$

## Food For Thought

In this problem , look at the supports of the two distributions under the null and alternative hypotheses.
See that both the supports are the same and hence the quantity $$\frac{f_1}{f_0}$$ is defined everywhere.
But suppose for a problem the two supports are not the same and they are not disjoint then try constructing a most powerful test using the Neyman Pearson Lemma.
For Example:
Let the family of distributions be $${\theta:X \sim U(0,\theta)}$$
Find the most powerful test for testing $$H_0 : \theta=1$$ against $$H_1: \theta=2$$
Note that the supports under null and alternative hypotheses are not the same in this case.
Give it a try!