Testing of Hypothesis | ISI MStat 2016 PSB Problem 9

This is a problem from the ISI MStat Entrance Examination, 2016 involving the basic idea of Type 1 error of Testing of Hypothesis but focussing on the fundamental relationship of Exponential Distribution and the Geometric Distribution.

The Problem:

Suppose \(X_{1}, X_{2}, \ldots, X_{n}\) is a random sample from an exponential distribution with mean \(\lambda\).

Assume that the observed data is available on \(\left[X_{1}\right], \ldots,\left[X_{n}\right]\), instead of \(X_{1}, \ldots, X_{n},\) where \([x]\) denotes the largest integer less than or equal to \(x\).

Consider a test for \(H_{0}: \lambda=1\) vs \(H_{1}: \lambda>1\) which rejects \(H_{0}\) when \(\sum_{i=1}^{n}\left[X_{i}\right]>c_{n} .\)

Given \(\alpha \in(0,1),\) obtain values of \(c_{n}\) such that the size of the test converges to \(\alpha\) as \(n \rightarrow \infty\).


(a) Testing of Hypothesis

(b) Type 1 Error

(c) Exponential Distribution

(d) Relationship of Exponential Distribution and Geometric Distribution

(e) Central Limit Theorem


  • X ~ Exponential(\(\lambda\)), then \(Y = [\frac{X}{a}]\) ~ Geom(\(p\)), where \( p = 1-e^{-\lambda a} \in(0,1) \)


\(Y\) is clearly discrete taking values in the set of non-negative integers, due to the flooring. Then, for any integer \(n \geq 0\) we have
P(Y=n)=P(X \in[\text {an, } a(n+1))) \
=\int_{a n}^{a(n+1)} \lambda \mathrm{e}^{-\lambda x} d x=(1-p)^{n} p
where \(p=1-e^{-\lambda a} \in(0,1),\) as \(\lambda>0\) and \(a>0\).

  • \(X_i\) ~ Geom(\(p\)), then \(\sum_{i = 1}^{n} \) ~ NBinom(n,p)
  • \(X_i\) ~ Exponential(\(\lambda\)), then \(S_n = \sum_{i=1}^{n}\left[X_{i}\right]\) ~ NBinom(\((n,p)\)), where \( p = 1-e^{-\lambda} \in(0,1) \)

Testing of Hypothesis

\(H_{0}: \lambda=1\) vs \(H_{1}: \lambda>1\)

We reject \(H_{0}\) when \(\sum_{i=1}^{n}\left[X_{i}\right]>c_{n} .\)

Here, the size of the test i.e the Type 1 error (for simple hypothesis), \( \alpha_n\) = \( P(S_n > c_{n} | \lambda=1)\).

We want to select \(c_n\) such that \(\alpha_n \to \alpha\).

\(S_n\) ~ NBinom(\(n,p\)), where \( p = 1-e^{-1} \) under \(H_0\).

Now, \(\frac{\sqrt{n}(\frac{S_n}{n} - \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}} \rightarrow Z = N(0,1)\) by Central Limit Theorem.

Observe that thus, \( \alpha_n = P(S_n > c_{n} | \lambda=1) \rightarrow P(Z > \frac{\sqrt{n}(\frac{c_n}{n} - \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}}) = \alpha\).

Thus, \( \frac{\sqrt{n}(\frac{c_n}{n} - \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}} = Z_{\alpha} \).

We can solve this to find \(c_n\), where \( p = 1-e^{-1} \)

Food for Thought

If X ~ Exponential(\(\lambda\)), then what is the distribution of {X} [ The fractional part of X]. This question is crucial is getting back Exponential Distrbution from Geometric Distribution.

Rather, the food for thought, asks you how do we get Exponential Distribution from Geometric Distribution.

Stay Tuned. Stay Blessed! See you in the next post.

ISI MStat PSB 2013 Problem 9 | Envelope Collector's Expenditure

This is a very simple and beautiful sample problem from ISI MStat PSB 2013 Problem 9. It is mainly based on geometric distribution and its expectation . Try it!

Problem- ISI MStat PSB 2013 Problem 9

Envelopes are on sale for Rs. 30 each. Each envelope contains exactly one coupon, which can be one of the one of four types with equal probability. Suppose you keep on buying envelopes and stop when you collect all the four type of coupons. What will be your expenditure ?


Geometric Distribution

Expectation of geometric distribution

Basic counting

Solution :

This problem seems quite simple and it is simple, often one may argue that we can take a single random variable, which denotes the number of trials till the fourth success (or is it third !!), and calculate its expectation. But I differ here becaue I find its lot easier to work with sum of geometric random variables than a negative binomial. ( negative binomial is actually sum of finite geometrics !!)

So, here what we will do, is define 4 random variables, as \(X_i\) : # trials to get a type of coupon that is different from the all the \((i-1)\) types of coupons drawn earlier. \(i=1,2,3,4\).

Now since each type of coupon has an equal probability to come, that is probability of success is \(\frac{1}{4}\), here a common mistakes people commit is assuming all \(X_1,X_2,X_3,X_4\) are i.i.d Geometric(\(\frac{1}{4}\)), and this turns out to be a disaster !! So, be aware and observe keenly, that at the first draw, any of the four types will come, with probability 1, and there after we just need the rest of the 3 types to appear at least once. So, here \(X_1=1\) always and \(X_2 \sim Geo(\frac{3}{4})\)( becuase since in the first trial, surely any of the four types will come, our success would be getting any of the 3 types of envelopes from the all types making the success probability \(\frac{3}{4}\)) similarly , \(X_3 \sim Geo(\frac{1}{2})\) and \(X_4 \sim Geo(\frac{1}{4})\).

Now for the expectated expenditure at the given rate of Rs. 30 per envelope, expected expenditure is Rs.\(30 E(X_1+X_2+X_3+X_4)\)

Now, we know that \(E(X_2)=\frac{4}{3}\), \(E(X_3)=2\) and \(E(X_4)=4\) (why??)

So, \(E(X_1+X_2+X_3+X_4)=1+\frac{4}{3}+2+4=\frac{25}{3}\).

So, our required expectation is Rs. 250. Hence we are done !

Food For Thought

Suppose among the envelopes you collected each envelope has an unique 5 digit number, you picked an envelope randomly and what is the chance of the number that is going to show up is has its digits in a non-decreasing order ?

How does the chances may vary if you find a k digit number on the envelope ? think over it !!

ISI MStat PSB 2008 Problem 10
Outstanding Statistics Program with Applications

Outstanding Statistics Program with Applications

Subscribe to Cheenta at Youtube

ISI MStat PSB 2004 Problem 1 | Games and Probability

This is a very beautiful sample problem from ISI MStat PSB 2004 Problem 1. Games are best ways to understand the the role of chances in life, solving these kind of problems always indulges me to think and think more on the uncertainties associated with the system. Think it over !!

Problem- ISI MStat PSB 2004 Problem 1

Suppose two teams play a series of games, each producing a winner and a loser, until one team has won two or more games than the other. Let G be the number of games played. Assume echs team has a chance of 0.5 to win each game, independent of the results of the previous games.

(a) Find the probability distribution of G.

(b) Find the expected value of G.


Naive Probability

Counting priciples

Geometric distribution.

Conditional expectation .

Solution :

While solving this kind of problem, First what we should do is to observe those things which remains invariant.

Here, observe that the game will always terminate, with consecutive wins of one team. Imagine there are two teams \(T_1\) and \( T_2\) . If the first two matches are won by any of the two teams we are done, but say \(T_1\) won the first match, but \(T_2\) won the 2nd match, so its a draw and the 2 matches played are basically lost, and we should renew the counting a fresh.

So, can I claim that G (as defined in the question), will always be even !! verify this claim yourself !

so, considering the event G=g , g is even. So, if the game terminates at the gth game, then it is evident from the logic we established above, then both the (g-1)th and gth game was won by the winning team. So, among the first (g-2) games, both the team won equal number of games, and ended in a draw. So, after the (g-2)th game, the teams can be at a draw in \( 2^( \frac{g-2}{2}) \) ways, and the last two matches can be won by any of the two teams in 2 ways. And the g matches can result in \(2^g\) different arrangements of wins and loss (from a perspective of any of the teams ).

(a) So, P(G=g)= \( \frac{2* 2^( \frac{g-2}{2})}{2^g}= \frac{1}{2^{\frac{g}{2}}} \) ; g=2,4,6,.......

Hence the distribution of G. Hold on ! is is looking like geometric somehow ?? Find it out!!

(b) While finding expectation of G, we can use the conventional definition of expectation and, and since I said that distribution of G is (somewhat) geometric, basically to be precise \( \frac{G}{2} \sim Geo(0.5) \), so, clearly expectation will be 4. But I will take this chance to show another beautiful and elegant method to find out expectation, using conditional expectation. So, we will first condition on the win over first match and develop a recursion, which I'm obsessed with. Though one may not find this method useful in this problem since the distribution of G is known to us, but life is not about this problem, isn't it ! What if the distribution is not known but the pattern is visible, only if you are ready to see it. Lets proceed,

Suppose, without loss of generality, let us assume, \(T_1\) won the first game, so 1 game is gone with probability 0.5 and with same probability it is expected that E(G|\(T_1\) is leading by 1 game ) number of games is to be played, similarly( OR) if \(T_2\) wins the first game, then with probability 0.5 , 1 game is gone and an expected E(G|\(T_2\) is leading by 1 game) number of games is to be played.

So, if we right the above words mathematically, it looks like,

E(G)= P(\(T_1\) wins the 1rst game )( 1+E(G|\(T_1\) is leading by 1 game))+P(\(T_4\) wins the 1rst game)(1+E(G|\(T_2\) is leading by 1 game)),......................(*)

So, now we need to find out E(G|\(T_1\) is leading by 1 game), the other is same due to symmetry!

so, expected number of games to be played when we know that \(T_1\) is leading the by 1 game, is the next game can be a win for \(T_1\), with probability 0.5, OR, \(T_1\) can lose the game with probability 0.5, and reach the stage of draw, and from here, all the games played is wasted and we start counting afresh (Loss of memory, remember !! ) , so 1 game is lost and still a expected E(G) numbers of games to follow before it terminates. So, mathematically,

E(G|\(T_1\) is leading by 1 game)= P(\(T_1\) wins again)x1 + P(\(T_1\) looses and draws)(1+E(G))=0.5+0.5+(0.5)E(G)=1+(0.5)E(G).

plugging in this, in (*), one will obtain a recursion of E(G), and calculate is as 4. So, on an average, the game terminates after 4 matches.

Food For Thought

Can you generalize the above problem, when I say, that the chance of winning each match, for one of the two team is some p, ( 0<p<1) ? Try it !

Wait!! Before leaving, lets toss some coins,

You toss a fair coin repeatedly. What is the expected number of tosses you think, you have to perform to get the pattern of HH (heads and heads) for the first time? What about the expected number of tosses when your pattern of interest is TH (or HT) ?? for which pattern you think you need to spend more time?? Does your intuition corroborates, the mathematical conclusions?? if not what's the reason you think, you are misled by your intuition ??

Think it over and over !! You are dealing with one of the most beautiful perspective of uncertainty !!

ISI MStat PSB 2008 Problem 10
Outstanding Statistics Program with Applications

Outstanding Statistics Program with Applications

Subscribe to Cheenta at Youtube

ISI MStat PSB 2014 Problem 9 | Hypothesis Testing

This is a another beautiful sample problem from ISI MStat PSB 2014 Problem 9. It is based on testing simple hypothesis, but reveals and uses a very cute property of Geometric distribution, which I prefer calling sister to Loss of memory . Give it a try !

Problem- ISI MStat PSB 2014 Problem 9

Let \( X_i \sim Geo(p_1)\) and \( X_2 \sim Geo(p_2)\) be independent random variables, where Geo(p) refers to Geometric distribution whose p.m.f. f is given by,

\(f(k)=p(1-p)^k, k=0,1,.....\)

We are interested in testing the null hypothesis \(H_o : p_1=p_2\) against the alternative \( H_1: p_1<p_2\). Intuitively it is clear that we should reject if \(X_1\) is large, but unfortunately, we cannot compute the cut-off because the distribution of \(X_1\) under \(H_o\) depends on the unknown (common) value \(p_1\) and \(p_2\).

(a) Let \(Y= X_1 +X_2\). Find the conditional distribution of \( X_1|Y=y\) when \(p_1=p_2\).

(b) Based on the result obtained in (a), derive a level 0.05 test for \(H_o\) against \(H_1\) when \(X_1\) is large.


Geometric Distribution.

Negative binomial distribution.

Discrete Uniform distribution .

Conditional Distribution . .

Simple Hypothesis Testing.

Solution :

Well, Part (a), is quite easy, but interesting and elegant, so I'm leaving it as an exercise, for you to have the fun. Hint: verify whether the required distribution is Discrete uniform or not ! If you are done, proceed .

Now, part (b), is further interesting, because here we will not use the conventional way of analyzing the distribution of \(X_1\) and \( X_2\), whereas we will be concentrating ourselves on the conditional distribution of \( X_1 | Y=y\) ! But why ?

The reason behind this adaptation of strategy is required, one of the reason is already given in the question itself, but the other reason is more interesting to observe , i.e. if you are done with (a), then by now you found that , the conditional distribution of \(X_1|Y=y\) is independent of any parameter ( i.e. ithe distribution of \(X_1\) looses all the information about the parameter \(p_1\) , when conditioned by Y=y , \(p_1=p_2\) is a necessary condition), and the parameter independent conditional distribution is nothing but a Discrete Uniform {0,1,....,y}, where y is the sum of \(X_1 \) and \(X_2\) .

so, under \(H_o: p_1=p_2\) , the distribution of \(X_1|Y=y\) is independent of the both common parameter \(p_1 \) and \(p_2\) . And clearly as stated in the problem itself, its intuitively understandable , large value of \(X_1\) exhibits evidences against \(H_o\). Since large value of \(X_1\) is realized, means the success doesn't come very often .i.e. \(p_1\) is smaller.

So, there will be strong evidence against \(H_o\) if \(X_1 > c\) , where , for some constant \(c \ge y\), where y is given the sum of \(X_1+X_2\).

So, for a level 0.05 test , the test will reject \(H_o\) for large value of k , such that,

\( P_{H_o}( X_1 > c| Y=y)=0.05 \Rightarrow \frac{y-c}{y+1} = 0.05 \Rightarrow c= 0.95 y - 0.05 .\)

So, we reject \(H_o\) at level 0.05, when we observe \( X_1 > 0.95y - 0.05 \) , where it is given that \(X_1+X_2\) =y . That's it!

Food For Thought

Can you show that for this same \(X_1 \) and \( X_2\) ,

\(P(X_1 \le n)- P( X_1+X_2 \le n)= \frac{1-p}{p}P(X_1+X_2= n) \)

considering \(p_1=p_2=p\) , where n=0,1,.... What about the converse? Does it hold? Find out!

But avoid loosing memory, it's beauty is exclusively for Geometric ( and exponential) !!

ISI MStat PSB 2008 Problem 10
Outstanding Statistics Program with Applications

Outstanding Statistics Program with Applications

Subscribe to Cheenta at Youtube

Application of Cauchy Functional Equations | ISI MStat 2019 PSB Problem 4

This problem is a beautiful application of the probability theory and cauchy functional equations. This is from ISI MStat 2019 PSB problem 4.

Problem - Application of Cauchy Functional Equations

Let \(X\) and \(Y\) be independent and identically distributed random variables with mean \(\mu>0\) and taking values in {\(0,1,2, \ldots\)}. Suppose, for all \(m \geq 0\)
\mathrm{P}(X=k | X+Y=m)=\frac{1}{m+1}, \quad k=0,1, \ldots, m
Find the distribution of \(X\) in terms of \(\mu\).



Let \( P(X =i) = p_i\) where $$\sum_{i=0}^{\infty} p_i = 1$$. Now, let's calculate \(P(X+Y = m)\).

$$P(X+Y = m) = \sum_{i=0}^{m} P(X+Y = m, X = i) = \sum_{i=0}^{m} P(Y = m-i, X = i) = \sum_{i=0}^{m} p_ip_{m-i}$$.

$$P( X = k|X+Y = m) = \frac{P( X = k, X+Y = m)}{P(X+Y = m)} = \frac{P( X = k, Y = m-k)}{\sum_{i=0}^{m} p_ip_{m-i}} = \frac{p_kp_{m-k}}{\sum_{i=0}^{m} p_ip_{m-i}} = \frac{1}{m+1}$$.

Hence,$$ \forall m \geq 0, p_0p_m =p_1p_{m-1} = \dots = p_mp_0$$.

Thus, we get the following set of equations.

$$ p_0p_2 = p_1^2$$ $$ p_0p_3 = p_1p_2$$ Hence, by the thrid prerequisite, \(p_0, p_1, p_2, p_3\) are in geometric progression.

Observe that as a result we get \( p_1p_3 =p_2^2 \). In the line there is waiting:

$$ p_1p_4 = p_2p_3$$. Thus, in the similar way we get \(p_1, p_2, p_3, p_4\) are in geometric progression.

Hence, by induction, we will get that \(p_k; k \geq 0\) form a geometric progression.

This is only possible if \(X, Y\) ~ Geom(\( p\)). We need to find \(p\) now, but here \(X\) counts the number of failures, and \(p\) is the probability of success.

So, \(E(X) = \frac{1-p}{p} = \mu \Rightarrow p = \frac{1}{\mu +1}\).

Challenge Problem

So, can you guess, what it will be its continous version?

It will be the exponential distribution. Prove it. But, what exactly is governing this exponential structure? What is the intuition behind it?

The Underlying Mathematics and Intuition

Observe that the obtained condition

$$ \forall m \geq 0, p_0p_m =p_1p_{m-1} = \dots = p_mp_0.$$ can be written as follows

Find all such functions \(f: \mathbf{N}_0 \to \mathbf{N}_0\) such that \(f(m)f(n) = f(m+n)\) and with the summation = 1 restriction property.

The only solution to this geometric progression structure. This is a variant of the Cauchy Functional Equation. For the continuous case, it will be exponential distribution.

Essentially, this is the functional equation that arises, if you march along to prove that the Geometric Random Variable is the only discrete distribution with the memoryless property.

Stay Tuned!