Categories

Testing of Hypothesis | ISI MStat 2016 PSB Problem 9

This is a problem from the ISI MStat Entrance Examination, 2016 involving the basic idea of Type 1 error of Testing of Hypothesis but focussing on the fundamental relationship of Exponential Distribution and the Geometric Distribution.

The Problem:

Suppose $X_{1}, X_{2}, \ldots, X_{n}$ is a random sample from an exponential distribution with mean $\lambda$.

Assume that the observed data is available on $\left[X_{1}\right], \ldots,\left[X_{n}\right]$, instead of $X_{1}, \ldots, X_{n},$ where $[x]$ denotes the largest integer less than or equal to $x$.

Consider a test for $H_{0}: \lambda=1$ vs $H_{1}: \lambda>1$ which rejects $H_{0}$ when $\sum_{i=1}^{n}\left[X_{i}\right]>c_{n} .$

Given $\alpha \in(0,1),$ obtain values of $c_{n}$ such that the size of the test converges to $\alpha$ as $n \rightarrow \infty$.

Prerequisites:

(a) Testing of Hypothesis

(b) Type 1 Error

(c) Exponential Distribution

(d) Relationship of Exponential Distribution and Geometric Distribution

(e) Central Limit Theorem

Solution:

• X ~ Exponential($\lambda$), then $Y = [\frac{X}{a}]$ ~ Geom($p$), where $p = 1-e^{-\lambda a} \in(0,1)$

Proof:

$Y$ is clearly discrete taking values in the set of non-negative integers, due to the flooring. Then, for any integer $n \geq 0$ we have
$\begin{array}{c} P(Y=n)=P(X \in[\text {an, } a(n+1))) \ =\int_{a n}^{a(n+1)} \lambda \mathrm{e}^{-\lambda x} d x=(1-p)^{n} p \end{array}$
where $p=1-e^{-\lambda a} \in(0,1),$ as $\lambda>0$ and $a>0$.

• $X_i$ ~ Geom($p$), then $\sum_{i = 1}^{n}$ ~ NBinom(n,p)
• $X_i$ ~ Exponential($\lambda$), then $S_n = \sum_{i=1}^{n}\left[X_{i}\right]$ ~ NBinom($(n,p)$), where $p = 1-e^{-\lambda} \in(0,1)$

Testing of Hypothesis

$H_{0}: \lambda=1$ vs $H_{1}: \lambda>1$

We reject $H_{0}$ when $\sum_{i=1}^{n}\left[X_{i}\right]>c_{n} .$

Here, the size of the test i.e the Type 1 error (for simple hypothesis), $\alpha_n$ = $P(S_n > c_{n} | \lambda=1)$.

We want to select $c_n$ such that $\alpha_n \to \alpha$.

$S_n$ ~ NBinom($n,p$), where $p = 1-e^{-1}$ under $H_0$.

Now, $\frac{\sqrt{n}(\frac{S_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}} \rightarrow Z = N(0,1)$ by Central Limit Theorem.

Observe that thus, $\alpha_n = P(S_n > c_{n} | \lambda=1) \rightarrow P(Z > \frac{\sqrt{n}(\frac{c_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}}) = \alpha$.

Thus, $\frac{\sqrt{n}(\frac{c_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}} = Z_{\alpha}$.

We can solve this to find $c_n$, where $p = 1-e^{-1}$

Food for Thought

If X ~ Exponential($\lambda$), then what is the distribution of {X} [ The fractional part of X]. This question is crucial is getting back Exponential Distrbution from Geometric Distribution.

Rather, the food for thought, asks you how do we get Exponential Distribution from Geometric Distribution.

Stay Tuned. Stay Blessed! See you in the next post.

Categories

ISI MStat PSB 2009 Problem 8 | How big is the Mean?

This is a very simple and regular sample problem from ISI MStat PSB 2009 Problem 8. It It is based on testing the nature of the mean of Exponential distribution. Give it a Try it !

Problem– ISI MStat PSB 2009 Problem 8

Let $X_1,…..,X_n$ be i.i.d. observation from the density,

$f(x)=\frac{1}{\mu}exp(-\frac{x}{\mu}) , x>0$

where $\mu >0$ is an unknown parameter.

Consider the problem of testing the hypothesis $H_o : \mu \le \mu_o$ against $H_1 : \mu > \mu_o$.

(a) Show that the test with critical region $[\bar{X} \ge \mu_o {\chi_{2n,1-\alpha}}^2/2n]$, where ${\chi^2}_{2n,1-\alpha}$ is the $(1-\alpha)$th quantile of the ${\chi^2}_{2n}$ distribution, has size $\alpha$.

(b) Give an expression of the power in terms of the c.d.f. of the ${\chi^2}_{2n}$ distribution.

Prerequisites

Likelihood Ratio Test

Exponential Distribution

Chi-squared Distribution

Solution :

This problem is quite regular and simple, from the given form of the hypotheses , it is almost clear that using Neyman-Pearson can land you in trouble. So, lets go for something more general , that is Likelihood Ratio Testing.

Hence, the Likelihood function of the $\mu$ for the given sample is ,

$L(\mu | \vec{X})=(\frac{1}{\mu})^n exp(-\frac{\sum_{i=1}^n X_i}{\mu}) , \mu>0$, also observe that sample mean $\vec{X}$ is the MLE of $\mu$.

So, the Likelihood Ratio statistic is,

$\lambda(\vec{x})=\frac{\sup_{\mu \le \mu_o}L(\mu |\vec{x})}{\sup_\mu L(\mu |\vec{x})} \\ =\begin{cases} 1 & \mu_o \ge \bar{X} \\ \frac{L(\mu_o|\vec{x})}{L(\bar{X}|\vec{x})} & \mu_o < \bar{X} \end{cases}$

So, our test function is ,

$\phi(\vec{x})=\begin{cases} 1 & \lambda(\vec{x})<k \\ 0 & otherwise \end{cases}$.

We, reject $H_o$ at size $\alpha$, when $\phi(\vec{x})=1$, for some $k$, $E_{H_o}(\phi) \le \alpha$,

Hence, $\lambda(\vec{x}) < k \\ \Rightarrow L(\mu_o|\vec{x})<kL(\bar{X}|\vec{x}) \\ \ln k_1 -\frac{1}{\mu_o}\sum_{i=1}^n X_i < \ln k -n \ln \bar{X} -\frac{1}{n} \\ n \ln \bar{X}-\frac{n\bar{X}}{\mu_o} < K*$.

for some constant, $K*$.

Let $g(\bar{x})=n\ln \bar{x} -\frac{n\bar{x}}{\mu_o}$, and observe that $g$ is,

decreasing function of $\bar{x}$ for $\bar{x} \ge \mu_o$,

Hence, there exists a $c$ such that $\bar{x} \ge c$,we have $g(\bar) < K*$. See the figure.

So, the critical region of the test is of form $\bar{X} \ge c$, for some $c$ such that,

$P_{H_o}(\bar{X} \ge c)=\alpha$, for some $0 \le \alpha \le 1$, where $\alpha$ is the size of the test.

Now, our task is to find $c$, and for that observe, if $X \sim Exponential(\theta)$, then $\frac{2X}{\theta} \sim {\chi^2}_2$,

Hence, in this problem, since the $X_i$’s follows $Exponential(\mu)$, hence, $\frac{2n\bar{X}}{\mu} \sim {\chi^2}_{2n}$, we have,

$P_{H_o}(\bar{X} \ge c)=\alpha \\ P_{H_o}(\frac{2n\bar{X}}{\mu_o} \ge \frac{2nc}{\mu_o})=\alpha \\ P_{H_o}({\chi^2}{2n} \ge \frac{2nc}{\mu_o})=\alpha$,

which gives $c=\frac{\mu_o {\chi^2}_{2n;1-\alpha}}{2n}$,

Hence, the rejection region is indeed, $[\bar{X} \ge \frac{\mu_o {\chi^2}_{2n;1-\alpha}}{2n}$.

Hence Proved !

(b) Now, we know that the power of the test is,

$\beta= E_{\mu}(\phi) \\ = P_{\mu}(\lambda(\bar{x})>k)=P(\bar{X} \ge \frac{\mu_o {\chi_{2n;1-\alpha}}^2}{2n}) \\ \beta = P_{\mu}({\chi^2}_{2n} \ge \frac{mu_o}{\mu}{\chi^2}_{2n;1-\alpha})$.

Hence, the power of the test is of form of a cdf of chi-squared distribution.

Food For Thought

Can you use any other testing procedure to conduct this test ?

Categories

ISI MStat PSB 2014 Problem 4 | The Machine’s Failure

This is a very simple sample problem from ISI MStat PSB 2014 Problem 4. It is based on order statistics, but generally due to one’s ignorance towards order statistics, one misses the subtleties . Be Careful !

Problem– ISI MStat PSB 2014 Problem 4

Consider a machine with three components whose times to failure are independently distributed as exponential random variables with mean $\lambda$. the machine continue to work as long as at least two components work. Find the expected time to failure of the machine.

Prerequisites

Exponential Distribution

Order statistics

Basic counting

Solution :

In the problem as it is said, let the 3 component part of the machine be A,B and C respectively, where $X_A, X_B$ and $X_C$ are the survival time of the respective parts. Now it is also told that $X_A, X_B$ and $X_C$ follows $exponential(\lambda)$, and clearly these random variables are also i.id.

Now, here comes the trick ! It is told that the machine stops when two or all parts of the machine stop working. Here, we sometimes gets confused and start thinking combinatorially. But the we forget the basic counting of combinatorics lies in ordering ! Suppose we start ordering the life time of the individual components .i.e. among $X_A, X_B$ and $X_C$, there exists a ordering and say if we write it in order, we have $X_{(1)} \le X_{(2)} \le X_{(3)}$.

Now observe that, after $X_{(2)}$ units of time, the machine will stop !! (Are you sure ?? think it over ).

So, expected time till the machine stops , is just $E(X_{(2)})$, but to find this we need to know the distribution of $X_{(2)}$.

We have the pdf of $X_{(2)}$ as, $f_{(2)}(x)= \frac{3!}{(2-1)!(3-2)!} [P(X \le x)]^{2-1}[P(X>x)]^{3-2}f_X(x)$.

Where $f_X(x)$ is the pdf of exponentional with mean $\lambda$.

So, $E(X(2))= \int^{\infty}_0 xf_{(2)}(x)dx$. which will turn out to be $\frac{5\lambda}{6}$, which I leave on the readers to verify , hence concluding my solution.

Food For Thought

Now, suppose, you want install an alarm system, which will notify you some times before the machine wears our!! So, what do you think your strategy should be ? Given that you have a strategy, you now replace the weared out part of the machine within the time period between the alarm rings and the machine stops working, to continue uninterrupted working.What is the expected time within which you must act ?

Keep the machine running !!

Categories

ISI MStat PSB 2012 Problem 3 | Finding the Distribution of a Random Variable

This is a very beautiful sample problem from ISI MStat PSB 2012 Problem 3 based on finding the distribution of a random variable . Let’s give it a try !!

Problem– ISI MStat PSB 2012 Problem 3

Let $X_{1}$ and $X_{2}$ be i.i.d. exponential random variables with mean $\lambda>0$ .Let $Y_{1}=X_{1}-X_{2}$ and $Y_{2}=R X_{1}-(1-R) X_{2},$ where $R$ is a Bernoulli random variable with parameter $1 / 2$ and is independent of $X_{1}$ and $X_{2}$
(a) Show that $Y_{1}$ and $Y_{2}$ have the same distribution.
(b) Obtain the common density function.

Prerequisites

Cumulative Distribution Function

Bernoulli distribution

Exponential Distribution

Solution :

Cumulative distribution of $Y_{1}$ be

$F_{Y_{1}}(y_{1})=P(Y_{1} \leq y_{1})=P(x_{1}-x_{2} \leq y_{1})$ ,$y_1 \in R$

$=P(x_{1} \leq y_{1}+x_{2})$
Now, $y_{1}+x_{2} \ge 0 \Rightarrow x_{2} \ge-y_{1}$
Now, if $y_{1} \ge 0$ then,
$P(x_{1} \le y_{1}+x_{2}) =\int_{0}^{\infty} P(x_{1} \le y_{1}+x_{2}t) \lambda e^{-\lambda x_{2}} d x_{2}$

=$\int_{0}^{\infty} \int_{0}^{y_{1}+x_{2}} \lambda e^{-\lambda x_{1}} x \lambda e^{-\lambda x_{2}} d x_{1} d x_{2}$

=$\int_{0}^{\infty} \lambda e^{\lambda x_{2}} x \lambda \times \frac{1}{\lambda} (1-e^{-(\lambda y_{1}+x_{2}) }) d x_{2}$

=$\int_{0}^{\infty} \lambda e^{-\lambda x_{2}} d x_{2}-\int_{0}^{\infty} \lambda e^{-\lambda (y_{1}+2 x_{2})} d x_{2}$

=$1-\frac{e^{-\lambda y_{1}}}{2}$

Now, $y_{1} \le 0$ then,
$P(x_{1} \leq y_{1}+x_{2}) =\int_{-y_{1}}^{\infty} \int_{0}^{y_{1}+x_{2}} \lambda e^{-\lambda x_{4}} x \lambda e^{-\lambda x_{2}} d x_{1} d x_{2}$
$=\int_{-y_{1}}^{\infty} x e^{-\lambda x_{2}}(1-e^{-\lambda(y_{1}+x_{1})}) d x_{2}$
$=\lambda \int_{-y_{1}}^{\infty} e^{-x^{2} x_{2}} d x_{2}-\int_{-y_{1}}^{\infty} \lambda e^{-\lambda(y_{1}+2 x_{2})} d x_{2}$
$=e^{+\lambda y_{1}}-\frac{e^{-\lambda y_{1}}}{2} x e^{+2 \lambda y_{1}}$
$=\frac{e^{\lambda y_{1}}}{2}$
Therefore, $F_{Y_{1}}(y_{1}) = \begin{cases} 1-\frac{e^{-\lambda y_{1}}}{2} & , i f y_{1} \ge 0 \\ \frac{e^{\lambda y_{1}}}{2} & ,if y_{1}<0 \end{cases}.$

Cumulative distribution of $Y_{2}$ be $F_{Y_{2}}(y_{2})=P(Y_{2} \le y_{2})$ , $y_2 \in R$

=$P(Y_{2} \le y_{2} \mid R=1) P(R=1)+P(Y_{2} \le y_{2} \mid R=0) P(R=0)$
$=P(x_{1} \le y_{2}) \times \frac{1}{2}+P(-x_{2} \le y_{2}) \times \frac{1}{2}$
= $\begin{cases} \frac{1}{2} [F_{x_{1}}(y_{2})+1] & , y_{2} \ge 0 \\ \frac{1}{2} [1-F_{x_{2}}(-y_{2})] & ,y_{2}<0 \end{cases}.$
=$\begin{cases} 1-\frac{e^{-\lambda y_{2}}}{2}, & \text { if } y_{2} \ge 0 \\ \frac{e^{\lambda y_{2}}}{2} \end{cases}.$
since cdf of exponential random Variable, X is $(1-e^{-\lambda x}), x \ge 0$
Thus both $Y_{1}$ and $Y_{2}$ has same distribution
(b) $f_{Y_{1}}(y_{1})=\begin{cases} \frac{d}{d y_{1}}(1-\frac{e^{-\lambda y_{1}}}{2}) & \text { if } y_{1} \ge 0 \\ \frac{d}{d y_{1}}(\frac{e^{\lambda y_{1}}}{2}) & , \text { if } y_{2}<0 \end{cases}$

= $\begin{cases} \frac{\lambda e^{-\lambda y_{1}}}{2} & \text { if } y_{1} \ge 0 \\ \frac{\lambda e^{\lambda y_{1}}}{2} & , \text { if } y_{1}<0 \end{cases}$

Similarly, for $Y_2$ .

Food For Thought

If $\theta \sim U(0, 2 \pi )$ then find the distribution of $sin(\theta + {\theta}_{0} )$ , where ${\theta}_{0} \in (0,2 \pi)$.

Categories

Life Testing Experiment | ISI MStat 2017 PSB Problem 5

This is a problem from the ISI MStat 2017 Entrance Examination and tests how good are your skills in modeling a life testing experiment using an exponential distribution.

The Problem:

The lifetime in hours of each bulb manufactured by a particular company follows an independent exponential distribution with mean $\lambda$. We need to test the null hypothesis $H_0: \lambda=1000$ against $H_1:\lambda=500$.
A statistician sets up an experiment with $50$ bulbs, with $5$ bulbs in each of $10$ different locations, to examine their lifetimes.

To get quick preliminary results,the statistician decides to stop the experiment as soon as one bulb fails at each location.Let $Y_i$ denote the lifetime of the first bulb to fail at location $i$.Obtain the most powerful test of $H_0$ against $H_1$ based on $Y_1,Y_2,â€¦Y_{10}$ and compute its power.

Prerequisites:

1.Properties of Exponential/Gamma distribution.

3.Order Statistics.

Proof:

As it is clear from the arrangement of the bulbs, the first to fail(among 5 in a given location) has the smallest lifetime among the same.

That is, in more mathematical terms, for a location $i$, we can write $Y_i = \text{min}(X_{i1},X_{i2},..,X_{i5})$.

Here, $X_{ij}$ denotes the $j$ th unit in the $i th$ location where $i=1,2,..,10$ and $j=1,2,..,5$

It is given that $X_{ij} \sim \text{Exp}(\lambda)$.

Can you see that $Y_i \sim \text{Exp}(5 \lambda)$? You may try to prove this result for this:

If $X_1,..,X_n$ be a random sample from $\text{Exp}(\lambda)$ distribution,

then $X_{(1)}=\text{min}(X_1,….,X_n) \sim \text{Exp}(n \lambda)$.

So, now we have $Y_1,Y_2,..Y_{10}$ in hand each having $\text{Exp}(5 \lambda)$ distribution.

Let the joint pdf be $f(\mathbf{y} )=\frac{1}{(5 \lambda)^{10}} e^{-\frac{\sum_{i=1}^{10} y_i}{5 \lambda}}$.

For testing $H_0: \lambda=1000$ against $H_1:\lambda=500$, we use the Neyman Pearson Lemma.

We have the critical region of the most powerful test as $\frac{f_{H_1}(\mathbf{y})}{f_{H_0}(\mathbf{y})} >c$

which after simplification comes out to be $\bar{Y} > K$ where $K$ is an appropriate constant.

Also, see that $\bar{Y} \sim \text{Gamma}(10,50 \lambda)$.

Can you use this fact to find the value of $K$ using the size ($\alpha$) criterion ? (Exercise to the reader)

Also, find the power of the test.

Challenge Problem:

The exponential distribution is used widely to model lifetime of appliances. The following scenario is based on such a model.

Suppose electric bulbs have a lifetime distribution with pdf $f(t)=\lambda e^{-\lambda t}$ where $t \in [0, \infty)$ .

These bulbs are used individually for street lighting in a large number of posts.A bulb is replaced immediately after it burns out.

Let’s break down the problem in steps.

(i)Starting from time $t=0$ , the process is observed till $t=T$.Can you calculate the expected number of replacements in a post during the interval $(0,T)$ ?

(ii) Hence,deduce $g(t) \text{dt}$ ,the probability of a bulb being replaced in $(t,t+ \text{dt})$ for $t < T$,irrespective of when the bulb was put in.

(iii)Next,suppose that at the end of the first interval of time $T$,all bulbs which were put in the posts before time $X < T$ and have not burned out are replaced by new ones,but the bulbs replaced after ttime $X$ continue to be used,provided,of course,that they have not burned out.

Prove that with such a mixture of old and new bulbs, the probability of a bulb having an expected lifetime > $\tau$ in the second interval of length $T$ is given by

$S_2(\tau)=\frac{1}{2}e^{-\lambda \tau}(1+ e^{-\lambda X})$

Also, try proving the general case where the lifetimes of the bulbs follow the pdf $f(t)$ . Here, $f(t)$ need not be the pdf of an exponential distribution .

You should be getting: $S_2(\tau)=(1-p)S_1(\tau) + \int_{0}^{x} g(T-x)S_1(x)S_1(\tau +x) \text{dx}$ ; where $\tau<T$

where, $p$ is the proportion of bulbs not replaced at time $t=T$ and $S_1(t)$ is the probability that a bulb has lifetime > $t$.

Categories

Application of Cauchy Functional Equations | ISI MStat 2019 PSB Problem 4

This problem is a beautiful application of the probability theory and cauchy functional equations. This is from ISI MStat 2019 PSB problem 4.

Problem – Application of Cauchy Functional Equations

Let $X$ and $Y$ be independent and identically distributed random variables with mean $\mu>0$ and taking values in {$0,1,2, \ldots$}. Suppose, for all $m \geq 0$
$$\mathrm{P}(X=k | X+Y=m)=\frac{1}{m+1}, \quad k=0,1, \ldots, m$$
Find the distribution of $X$ in terms of $\mu$.

Prerequisites

• Conditional Probability
• Geometric Distribution
• $a, b, c, d$ are numbers such that $ac = b^2$ & $ad = bc$, then $a, b, c, d$ are in geometric progression.

Solution

Let $P(X =i) = p_i$ where $$\sum_{i=0}^{\infty} p_i = 1$$. Now, let’s calculate $P(X+Y = m)$.

$$P(X+Y = m) = \sum_{i=0}^{m} P(X+Y = m, X = i) = \sum_{i=0}^{m} P(Y = m-i, X = i) = \sum_{i=0}^{m} p_ip_{m-i}$$.

$$P( X = k|X+Y = m) = \frac{P( X = k, X+Y = m)}{P(X+Y = m)} = \frac{P( X = k, Y = m-k)}{\sum_{i=0}^{m} p_ip_{m-i}} = \frac{p_kp_{m-k}}{\sum_{i=0}^{m} p_ip_{m-i}} = \frac{1}{m+1}$$.

Hence,$$\forall m \geq 0, p_0p_m =p_1p_{m-1} = \dots = p_mp_0$$.

Thus, we get the following set of equations.

$$p_0p_2 = p_1^2$$ $$p_0p_3 = p_1p_2$$ Hence, by the thrid prerequisite, $p_0, p_1, p_2, p_3$ are in geometric progression.

Observe that as a result we get $p_1p_3 =p_2^2$. In the line there is waiting:

$$p_1p_4 = p_2p_3$$. Thus, in the similar way we get $p_1, p_2, p_3, p_4$ are in geometric progression.

Hence, by induction, we will get that $p_k; k \geq 0$ form a geometric progression.

This is only possible if $X, Y$ ~ Geom($p$). We need to find $p$ now, but here $X$ counts the number of failures, and $p$ is the probability of success.

So, $E(X) = \frac{1-p}{p} = \mu \Rightarrow p = \frac{1}{\mu +1}$.

Challenge Problem

So, can you guess, what it will be its continous version?

It will be the exponential distribution. Prove it. But, what exactly is governing this exponential structure? What is the intuition behind it?

The Underlying Mathematics and Intuition

Observe that the obtained condition

$$\forall m \geq 0, p_0p_m =p_1p_{m-1} = \dots = p_mp_0.$$ can be written as follows

Find all such functions $f: \mathbf{N}_0 \to \mathbf{N}_0$ such that $f(m)f(n) = f(m+n)$ and with the summation = 1 restriction property.

The only solution to this geometric progression structure. This is a variant of the Cauchy Functional Equation. For the continuous case, it will be exponential distribution.

Essentially, this is the functional equation that arises, if you march along to prove that the Geometric Random Variable is the only discrete distribution with the memoryless property.

Stay Tuned!